Can I use Alea.cuBase / Alea GPU with CUDA 8.0? - aleagpu

I just tried to run Alea TK samples on machine with GTX 1070, and:
CUDA 7.5 installs, but doesn't seem to work there. NVidia says CUDA 8.0RC should be used with this GPU: https://devtalk.nvidia.com/default/topic/949823/cuda-setup-and-installation/when-the-cuda-toolkit-will-support-gtx1070-graphics-card-/
CUDA 8.0 also successfully installs there, but it seems all the bindings in Alea.cuBase are to CUDA 7.5 -- i.e. basically, all samples fail on attempt to load CUDA 7.5's "cu*64_75.dll" libraries, though 8.0 version includes similar ones with "_80" suffix.
Same samples run on machines with less recent GPUs (and thus CUDA 7.5) without any issues.
Is there any way to address this, or I should wait for an updated version of Alea.cuBase?

The GTX 1070 has a new GPU of Pascal architecture in it. Starting with Alea GPU V3 beta 17 we support also Pascal. Give it a try. Cuda 8 should also work. But you have to use the new Alea GPU version 3 beta release. The old Alea GPU v 2.2 is not capable to compile for Pascal.

Related

How to get 64-bit addressing, full RAM access using OpenCL with 2019 MacBook Pro 16" intel/amd

I have a 2019 MacBook Pro 16". It has an Intel Core i9, 8-core processor and an AMD Radeon Pro 5500M with 8 GB GPU RAM.
I have the laptop dual booting Mac OS 12.4 and Windows 11.
Running clinfo under Windows tells me essentially that the OpenCL support is version 2.0, and that the addressing is 64-bits, and the max allocatable memory is between 7-8 GB.
Running clinfo under Mac OS tells me that OpenCL support is version 1.2, that addressing is 32-bits little endian, and the max allocatable memory is about 2 GB.
I am guessing this means that any OpenCL code I run is then restricted to using 2GB because of the 32-bit addressing (I thought that limit was 4GB), but I am wondering a) is this true and b) if it is true, is there any way to enable OpenCL under Mac to use the full amount of GPU memory?
OpenCL support on macOS is not great and has not been updated/improved for almost a decade. It always maxes out at version 1.2 regardless of hardware.
I'm not sure how clinfo determines "max allocatable memory," but if this refers to CL_DEVICE_MAX_MEM_ALLOC_SIZE, this is not necessarily a hard limit and can be overly conservative at times. 32-bit addressing may introduce a hard limit though. I'd also experiment with allocating your memory as multiple buffers rather than one giant one.
For serious GPU programming on macOS, it's hard to recommend OpenCL these days - tooling and feature support on Apple's own Metal API is much better, but of course not source compatible with OpenCL and only available on Apple's own platforms. (OpenCL is now also explicitly deprecated on macOS.)

Using pytorch Cuda on MacBook Pro

I am using MacBook Pro (16-inch, 2019, macOS 10.15.5 (19F96))
GPU
AMD Radeon Pro 5300M
Intel UHD Graphics 630
I am trying to use Pytorch with Cuda on my mac.
All of the guides I saw assume that i have Nvidia graphic card.
I found this: https://github.com/pytorch/pytorch/issues/10657 issue, but it looks like I need to install ROCm, and according to their Supported Operating Systems, it only supports Linux.
Is it possible to run Pytorch on GPU using mac and AMD Graphic card?
No.
CUDA works only with supported NVidia GPUs, not with AMD GPUs.
There is an ongoing effort to support acceleration for AMD GPUs with PyTorch (via ROCm, which does not work on MacOS).
CUDA is a framework for GPU computing, that is developed by nVidia, for the nVidia GPUs. Also, the same goes for the CuDNN framework.
At the moment, you cannot use GPU acceleration with PyTorch with AMD GPU, i.e. without an nVidia GPU. The O.S. is not the problem, i.e. it doesn't matter that you have macOS. It is a matter of what GPU you have.
What you can do though, is that you can either purchase an external nVidia GPU or use some cluster. For example, Google Colab offers PyTorch compatibility.
PyTorch now supports training using Metal.
Announcement: https://pytorch.org/blog/introducing-accelerated-pytorch-training-on-mac/
To get started, install the latest nightly build of PyTorch: https://pytorch.org/get-started/locally/
Answer pre May 2022
Unfortunately, no GPU acceleration is available when using Pytorch on macOS. CUDA has not available on macOS for a while and it only runs on NVIDIA GPUs. AMDs equivalent library ROCm requires Linux.
If you are working with macOS 12.0 or later and would be willing to use TensorFlow instead, you can use the Mac optimized build of TensorFlow, which supports GPU training using Apple's own GPU acceleration library Metal.
Currently, you need Python 3.8 (<=3.7 and >=3.9 don't work) to run it. To install, run:
pip3 install tensorflow-macos
pip3 install tensorflow-metal
You may need to uninstall existing tensorflow distributions first or work in a virtual environment.
Then you can just
import tensorflow as tf
tf.test.is_gpu_available() # should return True
It will be possible in 4 months, around march 2022. See Soumith reply to this question on GitHub. https://github.com/pytorch/pytorch/issues/47702

Offline compilation for AMD and NVIDIA OpenCL Kernels without cards installed

I was trying to figure out a way to perform offline compilation of OpenCL kernels without installing Graphics cards. I have installed the SDK's.
Does anyone has any experience with compiling OpenCL Kernels without having the graphics cards installed for both any one of them NVIDIA or AMD.
I had asked a similar question on AMD forums
(http://devgurus.amd.com/message/1284379).
NVIDIA forums for long are in accessible so couldn't get any help from there.
Thanks
AMD has an OpenCL extension for compiling binaries for devices that are not present on the system. The extension is called cl_amd_offline_devices. Pass the property CL_CONTEXT_OFFLINE_DEVICES_AMD when creating a context and all of AMDs supported devices are reported and can be used to create binaries as if they were present on the system.
Check out their OpenCL programming guide at http://developer.amd.com/tools/hc/AMDAPPSDK/assets/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide.pdf for more info
No need to graphic card, you can compile OpenCL programs for CPU too. If you have Intel or AMD CPU this idea works. Download latest OpenCL SDK from corresponding manufacturer website and compile OpenCL program:
Intel OpenCL SDK
AMD APP

Install AMD OpenCL CPU driver with an Nvidia graphic card

I have seen this question many times but never found an answer for Windows.
I recently ported my CUDA code to OpenCL.
When testing with an ATI card, the Catalyst drivers contain a CPU OpenCL driver, hence I can run the OpenCL code on the CPU.
When testing with an NVIDIA card, there is no driver for the CPU.
Question is: how can I install (and deploy) a CPU driver when running with an Nvidia card?
Thanks a lot
To use OpenCL on CPU you don't need any driver, you only need OpenCL runtime that supports CPU, which (in case of AMD/ATI) is part of APP SDK. It could be installed no matter what GPU you have. Your end-users would also have to install the APP SDK: currently, there is no way to install OpenCL runtime only.
If you have Intel CPU, you better try Intel OpenCL SDK, which has separate installer. However, AMD APP SDK works on Intel CPUs quite well, but note vice versa.

glGetError hangs for several seconds

I am developing an OpenGL application and I am seeing some strange things happen. The machine I am testing with is equipped with an NVidia Quadro FX 4600 and it is running RHEL WS 4.3 x86_64 (kernel 2.6.9-34.ELsmp).
I've stepped through the application with a debugger and I've noticed that it is hanging on OpenGL calls that are receiving information from the OpenGL API: i.e. - glGetError, glIsEnabled, etc. Each time it hangs up, the system is unresponsive for 3-4 seconds.
Another thing that is interesting is that if this same code is run on RHEL 4.5 (Kernel 2.6.9-67.ELsmp), it runs completely fine. The same code also runs perfectly on Windows XP. All machines are using the exact same hardware:
PNY nVidia Quadro FX4600 768mb PCI Express
Dual Intel Xeon DP Quad Core E5345 2.33hz
4096 MB 667 MHz Fully Buffered DDR2
Super Micro X7DAL-E Intel 5000X Chipset Dual Xeon Motherboard
Enermax Liberty 620 watt Power Supply
I have upgraded to the latest 64bit drivers: Version 177.82, Release Date: Nov 12, 2008 and the result is the exact same.
Does anyone have any idea what could be causing the system to hang on these OpenGL calls?
It appears that this is an issue with less-than-perfect NVidia drivers for Linux. Upgrading to a newer kernel appears to help. If I am forced to use this dated kernel, there are some things that I've tried that seem to help.
Defining the __GL_YIELD environment variable to "NOTHING" prior to starting X seems to increase stability with this older kernel.
http://us.download.nvidia.com/XFree86/Linux-x86_64/177.82/README/chapter-11.html
I've also tried disabling Triple Buffering and Flipping.
I've also found that these forums are very helpful for Linux/NVidia problems. Just do a search for "linux crash"
You may be able to dig deeper by using a system profiler like Sysprof or OProfile. Do other OpenGL applications using these calls exhibit similar behavior?

Resources