GPU programming in Mathematica - wolfram-mathematica

GPU programming in Mathematica - wolfram-mathematica

I have an ATI graphics card in my laptop running Ubuntu 10.10. I installed the OpenCL library and tried to run programs in the GPU through Mathematica.
I imported the package "OpenCLLink" and checked OpenCLQ[] and it returned True
but when I run the example from the introduction in Mathematica it doesn't work. All the program is supposed to do is double a vector. For example, with doubleFun[{1,2,3},3] the output should be {2,4,6}, but in my case the output is still the same: {1,2,3}.
What might be the cause of this problem? Is it a configuration issue or something else?

Related

SyCL ComputeCpp: issues with the matrix_multiply SDK example

I just managed to install successfully the SyCL ComputeCpp + OpenCL (from CUDA) and running cmake to generate the samples VS2019 sln, successfully.
I've tried to run the matrix_multiply example ONLY, for now.
It ran successfully using the Intel FPGA emulator as a default device.
Changing the devices to the Device CPU worked well as well.
Choosing the host device, took ages without exiting.
When I tried to change the device to the nVidia, the GeForce GTX 1650 Ti.
I got this expection error from there ComputeCpp:RT0100, etc etc.
Googling a bit, I found I'd probably have to output the PTX instead of the SPIR.
So I regenerated the sln using -DCOMPUTECPP_BITCODE=ptx64
After doing that, the kernel ran successfully on the nVidia GPU.
My first question is: is that needed since nVidia does NOT support spir yet at the time of this writing, but only PTX?
However this broke the other devices, which are now reporting:
[ComputeCpp:RT0107] Failed to create program from binary
This happens now for all devices: Intel GPU, Device CPU, Device FPGA (While were formerly working)
Inspecting the .sycl I found now SYCL_matrix_multiply_cpp_bin_nvptx64[].
My question is: how to support nVidia with ptx and "normal" devices with spir altogether in the same exe? I did a menù from which the user can choose to play with, but now it's working only for nVidia.
What am I doing wrong, please?
I would expect to be able to run the same .sycl code for all the devices despite it contains ptx or spir. How to manage for that?
EDIT: I just tried to retarget the bitcode to spirv64, since the computecpp_info told me all my devices are supposed to support it.
However, now no device is anymore working with that setting :-(

How does the ARM Linux kernel map console output to a hardware device on boot?

I'm currently struggling to determine how I can get an emulated environment via QEMU to correctly display output on the command line. I have an environment that displays perfectly well using the virt reference board, a cortex-a9CPU, and the 4.1 Linux kernel cross-compiled for ARM. However, if I swap out the 4.1 kernel for 2.6 or 3.1, suddenly I can no longer see console output.
While solving this issue is my main goal, I feel like I lack a critical understanding of how Linux and the hardware initially integrate before userspace configurations via boot scripts and whatnot have a chance to execute. I am aware of the device tree, and have a loose understanding of how it works. But the issue I ran into where a different kernel version broke console availability entirely confounds me. Can someone explain how Linux initially maps console output to a hardware device on the ARM architecture?
Thank you!

The answer depends quite a bit on which kernel version, what config options are set, what hardware, and also possibly on kernel command line arguments.
For modern kernels, the answer is that it looks in the device tree blob it is passed for descriptions of devices, some of which will be serial ports, and it initializes those. The kernel config or command line will specify which of those is to be used for the console. For earlier kernels, especially if you go all the way back to 2.6, use of device tree was less universal, and for some hardware the boot loader simply said "this is a versatile express board" (for instance) and the kernel had compiled-in data structures to tell it where the devices were for each board that it supported. As the transition to device tree progressed, boards were converted one by one, and sometimes a few devices at a time, so what exactly the situation was for any specific kernel version depends on which board you're using.
The other thing that I rather suspect you're running into is that if the kernel crashes early in bootup (ie before it finds the serial port at all) then it will never output anything. So if the kernel is just too early to support the "virt" board properly at all, or if your kernel config is missing something important, then the chances are good that it crashes in early boot without being able to print you a useful message. (Sometimes "earlycon" or "earlyprintk" kernel arguments can assist here, but not always.)

GDB Debugging a Raspberry Pi via QEMU

I have multiple questions regarding debugging a Raspberry pi 3 from a linux x64 host using gdb-multiarch, as well as writing bare metal programs in general. We are currently facing a problem where our code appears to not be loaded into memory. When we begin debugging in GDB we start at address 0. 3 instructions down we jump into 0x10000. If I modify my linker script to put the Raspberry pi into either address I get the same result, we jump into 0x10000 and our code isn't loaded there. Instead we get this
We noticed also that GDB is using 32 bit register names here when we're supposed to be debugging 64 bit code.
Again a recap of what we're using:
QEMU with versatile-pb machine.
An aarch64 GCC cross-compiler.
GDB-multiarch.
We've tried on two different hosts: Ubuntu 16.04 x64 Host running in virtualbox. Mint x64 running natively.
We also tried the arm-none-eabi toolchain but were running into problems not having our code compiled as 64 bit.
Help is much appreciated! Thanks!

You don't give your command line, but "versatile-pb" is a 32-bit only board type, so trying to run 64-bit code on it is going to misbehave in confusing ways. You need to tell QEMU to emulate a 64-bit capable board that matches what your bare-metal code is expecting to run on.
In QEMU 2.12 there will be a "raspi3" QEMU board which may be helpful for you; you'd need to try building the latest 2.12 release candidate tarball at the moment if you wanted to experiment with that (2.12 release isn't due for another couple of weeks). Otherwise you could use the "virt" board if you made sure your bare metal code was built to be able to run on that board.

Create virtual hardware, kernel, qemu for Android Emulator in order to produce OpenGL graphics

I am new to android and wish to play around with the emulator.
What I want to do is to create my own piece of virtual hardware that can collect OpenGL commands and produce OpenGL graphics.
I have been told that in order to do this I will need to write a linux kernal driver to enable communication with the hardware. Additionally, I will need to write an Android user space library to call the kernal driver.
To start with I plan on making a very simple piece of hardware that only does, say 1 or 2, commands.
Has anyone here done something like this? If so, do you have any tips or possible links to extra information?
Any feedback would be appreciated.

Writing a hardware emulation is a tricky task and by no means easy. So if you really want to do this, I'd not start from scratch. In your case I'd first start with some simpler (because many of the libraries are already in place on guest and the host side): Implementing a OpenGL passthrough for ordinary Linux through qemu. What does it take:
First you add some virtual GPU into qemu, which also involves adding a new graphics output module that uses OpenGL (so far qemu uses SDL). Next you create DRI/DRM drivers in the Linux kernel, that will run on the guest (Android uses its own graphics system, but for learning DRI/DRM are fine), as well as in Mesa. On the host side you must translate what comes from qemu in OpenGL calls. Since the host side GPU is doing all the hard work your DRI/DRM part will be quite minimal and just build a brigde.

The emulator that comes with Android SDK 23 already runs OpenGL, you can try this out with the official MoreTeapots example: https://github.com/googlesamples/android-ndk/tree/a5fdebebdb27ea29cb8a96e08e1ed8c796fa52db/MoreTeapots
I am pretty sure that it is hardware accelerated, since all those polygons are rendering at 60 FPS.
The AVD creation GUI from Studio has a hardware acceleration option, which should control options like:
==> config.ini <==
hw.gpu.enabled=yes
hw.gpu.mode=auto
==> hardware-qemu.ini <==
hw.gpu.enabled = true
hw.gpu.mode = host
hw.gpu.blacklisted = no
in ~/.android/avd/Nexus_One_API_24.a/.

ATI Stream SDK on ubuntu 9.04

I have used ATI Stream SDK on windows XP SP3 and implemented one algorithm on GPU. But Now I am interested in scaling this algorithm on multiple GPUs on mutiple machines I switched to UBUNTU to use MPI ( To send messages ).
I googled this but I got references for installation on SLES and RHEL but I am looking for UBUNTU 9.04.
Thanks
GG

AMD is switching to OpenCL based API soon. May be it will be worthwhile holding your horses till the OpenCL API stabilizes. Cuda is far ahead of the curve in terms of GPU usability, there is a nice project called MAGMA which is bringing together the LAPACK library for joint CPU-GPU usage.

I know of people who are using the ATI Stream SDK and ACML-GPU on Ubuntu without any special problems -- that is, no problems that they wouldn't have on any other Linux distro.
If you can get the Catalyst drivers installed correctly (which in this case will probably mean compiling your kernel modules) and your X windows configured correctly (especially DRI module, and there are security issues if you want Stream to work with remote access) it should work.
I'm tempted to ask/comment how you plan to share GPUs between multiple MPI processes, but that's probably wandering off-topic.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio