OpenCL maturity under Windows - windows

I consider using OpenCL in a consumer product which is currently under development.
Doing a small research I found that generally there is good support under Mac OSX. Linux support is also relatively good, but my target audience does not use Linux. It remains to check how well it is supported in Windows.
Regarding Windows I found OpenCL distribution which raises some concerns.
Do any of you have any experience with using OpenCL in consumer-oriented products under Windows? I am more interested in the GPU side of OpenCL, specifically driver support.

Just like CUDA or Stream, OpenCL needs to be supported by the driver. Most CUDA-capable GPUs support OpenCL with a somewhat up-to-date driver (CUDA 1.0 upwards).
In fact, if you compile with, say, CUDA SDK 4.1 your end users will need newer drivers than if you had used OpenCL.
Also, OpenCL is not bound to any GPU architecture. While this might be problematic for specifically designed algorithms, it shouldn't have a very high impact on normal end user programs.
At least with CUDA, you can only compile code optimized for the current known major version. Compiling OpenCL kernels on the end user machine might allow optimizations for newer binary specifications in the future.
The crashes the author in that questions reported for Nvidia OpenCL generally seem to happen a lot if resources are not freed properly. I've been seeing similar crashes until I fixed a leak that didn't release created kernels.
I'm not saying it's the only reason why it might crash, but apart from programmer errors it appears fairly stable to me.

AMD and NVidia both support OpenCL on most (all?) of their GPUs
Unfortunately Intel only supports it on the CPU which is a bit pointless and if you have to insist that the user has a separate GPU for your app you can also insist that they have an NVidia one and use CUDA. This has limited the uptake of OpenCL.

Related

Can userspace code leverage NVIDIA's open-sourcing of their kernel modules?

NVIDIA has recently announced they are open-sourcing (a variant of) their GPU Linux kernel driver. They are not, however, open-sourcing the user-mode driver libraries (e.g. libcuda.so).
It's a gradual process and not all GPUs are supported initially, but regardless of these details: Is there some way that developers of user-space code can leverage this open-sourcing? Or is it only interesting/useful for kernel developers?
What I would personally love to be able to do is avoid having to make libcuda calls to get the current context. If that piece of information were somehow readable now from userspace, that could be neat. Of course that's just wishful thinking on my part - I don't know how to check what the driver directly "exposes" - if anything.

What does M1 mac optimization process for an application mean?

You know the ARM-based M1 chips that are used in modern mac computers. On those macs, some number of software are ran through the layer called Rosetta (Discord, Steam), some natively, directly through M1 (Slack, IntelliJ) and some actually doesn't work in either way (Virtual Box). Huge list holding the status can be found here.
Apps that can be ran only with Rosetta are not yet M1 optimized, their developers have to optimize it, it takes some time to do so. But what does it mean to optimize it? What the process looks like? I'm quite sure that they don't rewrite the whole application code to another language (like Swift), because Jetbrains was able to M1 optimize their apps quite quickly. On the other hand, Discord is not yet optimized, same for Unity game engine (it's in beta though).
At bottom, it just means that the compiler's backend was configured to emit ARM64 instructions for the program instead of (or in-addition to) x86-64 instructions.
This means that certain x86-64 specific functionality instruction can no longer be used, unless equivalent ARM instructions are used instead.
This usually isn't much of a problem though, because most macOS software is typically written at a higher level of abstraction, using system-provided frameworks.
For example, using CoreImage to manipulate images abstracts you from the details of the CPU and GPU. In such cases, Apple does the heavy lifting of porting over their frameworks. All you have to do as an application developer is to check a box that says "target ARM64".

What is it meant by "developers must optimise their apps to run on ARM-based processors"?

This is a subject that I am not very knowledgable about and I was hoping to get a better understanding on the topic.
I was going through articles about Apple's transition to Apple Silicon and at some point I read "Apple is going to ship Rosetta 2, an emulation layer that lets you run old apps on new Macs."
As far as I know, an application is written in a high level language (e.g. C/C++,Java etc.). Then the compiler (let's assume interpreters don't exist for a moment) reads that code and translates it to assembly code. Then the assembler will convert assembly code to machine code which is readable by the processor.
My question is, assuming the above are correct, why is Rosetta 2 required since a CPU is supposed to translate high level code into readable machine code anyway? Why would developers need to "optimise" (or care on what processor their applications are run on) their applications since they are written (mostly) in high level language (which the processor can compile) ? I don't get why would programmers care if the CPU is supposed to handle compiling and assembling.
This question is probably rather trivial but I couldn't find what I was looking for just by reading about compilers or CPU architecture.
a CPU is supposed to translate high level code into readable machine code anyway?
No, the CPU doesn't do that itself, it happens via software running on the CPU (JIT or ahead-of-time compiler).
For ahead-of-time compiler (e.g. normal C++ implementations), closed source software only ships x86 machine code, not source. So you can't just recompile it yourself. Open-source software is usually easily portable by recompiling.
Rewritten is an overstatement for most apps, most can just recompile.
But if you have custom x86-specific code, like manually vectorized SIMD loops using SSE / AVX intrinsics or hand-written asm, you'd have to port those to NEON / AArch64 SIMD.

OpenCL programming in Charm++

Is it possible to run OpenCL through Charm++, while retaining the same fault tolerance and load balancing capabilities as for CPU or CUDA?
I did not explicitly see anything mentioned in the tutorials or the book.
Background: I'm one of the core developers of Charm++.
It's not clear whether you mean compiling OpenCL code to a Charm++-based parallel program, or calling kernels written in OpenCL from Charm++ code. Regardless, there is nothing explicitly implemented to support either of those cases at present.
Compiling OpenCL to Charm++ would be a large project. I don't know of anyone proposing to do such a thing, but it's not fundamentally implausible.
The research group behind Charm++, the Parallel Programming Laboratory has looked at the possibility of implementing OpenCL support to match our offload support for CUDA-based accelerators. This would not be particularly hard. However, at present, we don't have any demand from grant-funded projects that support our work to do so. We would welcome contributions of code to do this. There's also the possibility that commercial development may lead to this getting implemented.

List of OpenCL compliant CPU/GPU

How can I know which CPU can be programmed by OpenCL?
For example, the Pentium E5200.
Is there a way to know w/o running and querying it?
OpenCL compatibility can generally be determined by looking on the vendor's sites. AMD's APP SDK requires CPUs to support at least SSE2. They also have a list of currently supported ATI/AMD video cards.
The most official source is probably the Khronos conformance list:
http://www.khronos.org/conformance/adopters/conformant-products#opencl
For compatibility with the AMD APP SDK: http://developer.amd.com/gpu/AMDAPPSDK/pages/DriverCompatibility.aspx
For the NVIDIA, anything that supports CUDA should support their implementation of OpenCL:
http://www.nvidia.com/object/cuda_gpus.html
For compatibility with the Intel OpenCL SDK, look at:
https://software.intel.com/en-us/articles/opencl-code-builder-release-notes
Here is the list of conforming OpenCL products from the Khronos site:
http://www.khronos.org/conformance/adopters/conformant-products/
You got Intel OpenCL too http://software.intel.com/en-us/articles/intel-opencl-sdk/ for windows right now.
Just one more comment about Intel, Now they do not only support OpenCL under windows, but also linux. But it is part of a commercial SDK see https://software.intel.com/en-us/intel-media-server-studio.
Another alternative for OpenCL development under Linux is Beignet, an OpenCL source project maintain by Intel China.
http://www.freedesktop.org/wiki/Software/Beignet/
I have tested on linux and it works as per tutorial, however, the compiler they use is completely different from the one under the windows.
Well for the CPU, AMD's SDK is supposed to work on x86 (even on Intel's x86), so that will cover most of your options.
And for the GPU, I think almost all cards made in the last couple of years should run OpenCL kernels. I don't have of a particular list.
EDIT: Looks like AMD removed the original SDK pages with no replacement. There are unofficial mirrors for Windows and Linux, but I haven't tried them.

Resources