Offline compilation for AMD and NVIDIA OpenCL Kernels without cards installed - compilation

I was trying to figure out a way to perform offline compilation of OpenCL kernels without installing Graphics cards. I have installed the SDK's.
Does anyone has any experience with compiling OpenCL Kernels without having the graphics cards installed for both any one of them NVIDIA or AMD.
I had asked a similar question on AMD forums
(http://devgurus.amd.com/message/1284379).
NVIDIA forums for long are in accessible so couldn't get any help from there.
Thanks

AMD has an OpenCL extension for compiling binaries for devices that are not present on the system. The extension is called cl_amd_offline_devices. Pass the property CL_CONTEXT_OFFLINE_DEVICES_AMD when creating a context and all of AMDs supported devices are reported and can be used to create binaries as if they were present on the system.
Check out their OpenCL programming guide at http://developer.amd.com/tools/hc/AMDAPPSDK/assets/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide.pdf for more info

No need to graphic card, you can compile OpenCL programs for CPU too. If you have Intel or AMD CPU this idea works. Download latest OpenCL SDK from corresponding manufacturer website and compile OpenCL program:
Intel OpenCL SDK
AMD APP

Related

SyCL ComputeCpp: issues with the matrix_multiply SDK example

I just managed to install successfully the SyCL ComputeCpp + OpenCL (from CUDA) and running cmake to generate the samples VS2019 sln, successfully.
I've tried to run the matrix_multiply example ONLY, for now.
It ran successfully using the Intel FPGA emulator as a default device.
Changing the devices to the Device CPU worked well as well.
Choosing the host device, took ages without exiting.
When I tried to change the device to the nVidia, the GeForce GTX 1650 Ti.
I got this expection error from there ComputeCpp:RT0100, etc etc.
Googling a bit, I found I'd probably have to output the PTX instead of the SPIR.
So I regenerated the sln using -DCOMPUTECPP_BITCODE=ptx64
After doing that, the kernel ran successfully on the nVidia GPU.
My first question is: is that needed since nVidia does NOT support spir yet at the time of this writing, but only PTX?
However this broke the other devices, which are now reporting:
[ComputeCpp:RT0107] Failed to create program from binary
This happens now for all devices: Intel GPU, Device CPU, Device FPGA (While were formerly working)
Inspecting the .sycl I found now SYCL_matrix_multiply_cpp_bin_nvptx64[].
My question is: how to support nVidia with ptx and "normal" devices with spir altogether in the same exe? I did a menù from which the user can choose to play with, but now it's working only for nVidia.
What am I doing wrong, please?
I would expect to be able to run the same .sycl code for all the devices despite it contains ptx or spir. How to manage for that?
EDIT: I just tried to retarget the bitcode to spirv64, since the computecpp_info told me all my devices are supposed to support it.
However, now no device is anymore working with that setting :-(

How to setup OpenCL in Cygwin for Intel GPU

I have a Laptop with Intel(R) HD Graphics 520 GPU in it. I added OpenCL developer package to Cygwin. I have found a small Mandelbrot-set calculator program for OpenCL in C on GitHub. It is for Apple, so I modified the Makefile to use the proper headers and settings for gcc. Now the code compiles and executes nicely (bmp file created):
$ ./mandelbrot.exe
Device 0: GenuineIntel pthread-Intel(R) Core(TM) i5-6300U CPU # 2.40GHz
I have two questions:
#1. How can I add (if it is possible) the Intel GPU to the /etc/OpenCL/vendors list? I tried to install from Intel site the Intel CPU runtime for OpenCL Applications for Windows OS and Intel Graphics Technology driver package, but I do not know where can I find the proper OpenCL dll I can point in the intel.idc file.
#2. In /etc/OpenCL/vendors I have found a pocl.icd file pointing to cygpocl-2.dll. I assume this is the pthread library. But it seems to me it is running only a single thread, although I have 4 cores. Should I do any modification to run it in multiple threads? I debugged the code and it seems that as there are only one device found, so it runs only on one thread. In the initialization function it sets the device_work_size property for processing a stirp per device of the final bmp. But as there is only one device, the whole bmp is processed by one run (one clEnqueueNDRangeKernel and one clEnqueueReadBuffer is called).
UPDATE
I have installed Intel(R) Graphics – Windows* DCH Drivers. It installed graphics drivers. I have found the intelocl64.dll (as /cygdrive/c/Program Files (x86)/Common Files/Intel/Shared Libraries/intel64/intelocl64.dll). I put this whole path into /etc/OpenCL/vendors/intel.icd file. So far, so good. Now it cannot even find pthread device... Bah...
I don't think it is possible to use the GPU from cygwin. I would recommend to either build a native Windows binary (e.g. with Visual Studio or Intel DPC++) or use WSL. See https://github.com/intel/compute-runtime/blob/master/WSL.md for requirements.

Got Android Studio installation error

I am kinda new to Android Studio & stuff. So today, I was installing the Android Studio with the SDK Manager. All was going smooth until an error came up which says:
Unable to install Intel HAXM
Your CPU does not support required features (VT-x or SVM).
Unfortunately, your computer does not support hardware accelerated virtualization.
Here are some of your options:
Use a physical device for testing
Develop on a Windows/OSX computer with an Intel processor that
supports VT-x and NX
Develop on a Linux computer that supports VT-x or SVM
Use an Android Virtual Device based on an ARM system image (This
is 10x slower than hardware accelerated virtualization)
I've attached a pic of my system specs. Can someone please throw some light on this issue?
Thanks
It is because you had not intialize virtual technology in your device.You Need to go in BOOT Option before starting WINDOWS OS and enable VT-x from there>
The option of enabling Virtual technology is putted in different option depends on device manufacturer
Edit: Android Studio emulator won't run on Windows with an AMD processor. The error message is kind of misleading, as it suggests the problem is with your CPU. But it is within the troubleshoot message: "Windows/OSX computer with an Intel processor". Basicallly, that means it is not going to work properly in your current setup. Either try installing Linux and running Android Studio on that (which might come with its own issues), using a physical device for testing or use the slow ARM images.
You are using an AMD processor. SVM is AMD technology and VT-x is Intel technology. So you won't be able to get VT-x to run, but SVM might be possible.
As another poster had suggested, virtualization may have been disabled in the BIOS. There may be an option to enable virtualization. It does however seem to happen that virtualization is activated in the BIOS and Android-Studio does not recognize that. I have not figured out how to fix that either.
You could use the emulator with an ARM image, which will be very slow. Alternatively, you could use another emulator that is not integrated into Android-Studio.

cuda nvcc cross compiler

I want to compile CUDA code on mac but make it executable on Windows.
Is there a way to set up an nvcc CUDA cross compiler?
The problem is that my desktop windows will be inaccessible for a while due to traveling, however i do not want to wasted time by waiting til i get back and compile the code. If I have to wait then it would be a waste of time to debug the code and make sure it compiles correct and the likes. My mac is not equipped with cuda capable hardware though.
The short answer, is no, it is not possible.
It is a common misconception, but nvcc isn't actually a compiler. It is a compiler driver, and it relies heavily on the host C++ compiler in order to steer compilation both host and device code. To compile CUDA for Windows, you must using the Microsoft C++ compiler. That compiler can't be run on Linux or OS X, so cross compilation to a Windows target is not possible unless you are doing the compilation on a Windows host (so 32/64 bit cross compilation is possible, for example).
The other two CUDA platforms are equally incompatible, despite requiring gcc for compilation, because the back ends are different (Linux is an elf platform, OS X is a mach platform), so even cross compilation between OS X and Linux isn't possible.
You have two choices if compilation on the OS X platform is the goal
Install the OS X toolkit. Even though your hardware doesn't have a compatible GPU, you can still install the toolkit and compile code.
Install the Windows toolkit and visual studio inside a virtual windows installation (or a physical boot camp installation), and compile code inside Windows on the Mac. Again, you don't need NVIDIA compatible hardware to do this.
If you want to run code without a CUDA GPU, there is a non-commercial (GPU Ocelot) and commercial (PGI CUDA-x86) option you could investigate.

Install AMD OpenCL CPU driver with an Nvidia graphic card

I have seen this question many times but never found an answer for Windows.
I recently ported my CUDA code to OpenCL.
When testing with an ATI card, the Catalyst drivers contain a CPU OpenCL driver, hence I can run the OpenCL code on the CPU.
When testing with an NVIDIA card, there is no driver for the CPU.
Question is: how can I install (and deploy) a CPU driver when running with an Nvidia card?
Thanks a lot
To use OpenCL on CPU you don't need any driver, you only need OpenCL runtime that supports CPU, which (in case of AMD/ATI) is part of APP SDK. It could be installed no matter what GPU you have. Your end-users would also have to install the APP SDK: currently, there is no way to install OpenCL runtime only.
If you have Intel CPU, you better try Intel OpenCL SDK, which has separate installer. However, AMD APP SDK works on Intel CPUs quite well, but note vice versa.

Resources