Does Alea GPU allow keeping LLVM IR code in the compilation chain? - compilation

Nvidia does not allow the access to the generated LLVM IR in the compilation flow of a GPU kernel written in CUDA C/C++. I would like to know if this is possible if I use Alea GPU? In other words, does Alea GPU compilation procedure allows keeping the generated optimized/unoptimized LLVM IR code?

Yes, you are right, Nvidia doesnot show you the LLVM IR, you can only get the PTX code. While Alea GPU allows you to access LLVM IR in several ways:
Method 1
You use the workflow-based method to code a GPU module as a template, then you compile the template into an LLVM IR module, then you link the LLVM IRModule, optionally with some other IR modules, into a PTX module. Finally, you load the PTX module into the GPU worker. While you get the LLVM IRModule, you can call its method Dump() to print the IR code to console. Or you can get the bitcode as byte[].
I suggest you read more details here:
Workflow-Based GPU Coding
Workflows in Detail
The F# would be something like this:
let template = cuda {
// define your kernel functions or other gpu moudle stuff
let! kernel = <# fun .... #> |> Compiler.DefineKernel
// return an entry pointer for this module, something like the
// main() function for a C program
return Entry(fun program ->
let worker = program.Worker
let kernel = program.Apply kernel
let main() = ....
main ) }
let irModule = Compiler.Compile(template).IRModule
irModule.Dump() // dump the IR code
let ptxModule = Compiler.Link(irModule).PTXModule
ptxModule.Dump()
use program = worker.LoadProgram(ptxModule)
program.Run(...)
Method 2
If you are using Method-based or Instance-based way to code GPU module, you can add event handler for LLVM IR code generated and PTX generated though Alea.CUDA.Events. The code in F# will look like:
let desktopFolder = Environment.GetFolderPath(Environment.SpecialFolder.Desktop)
let (##) a b = Path.Combine(a, b)
Events.Instance.IRCode.Add(fun ircode ->
File.WriteAllBytes(desktopFolder ## "module.ir", ircode))
Events.Instance.PTXCode.Add(fun ptxcode ->
File.WriteAllBytes(desktopFolder ## "module.ptx", ptxcode))
Extend GPU function by using LLVM code
Finally, there is an undocumented way, to let you directly operate on LLVM IR code to construct the functions. It is done by attribute that implemented some IR building interface. Here is a simple example, that accept the parameter, and print it (in compile-time), and return it back:
[<AttributeUsage(AttributeTargets.Method, AllowMultiple = false)>]
type IdentityAttribute() =
inherit Attribute()
interface ICustomCallBuilder with
member this.Build(ctx, irObject, info, irParams) =
match irObject, irParams with
| None, irParam :: [] ->
// the irParam is of type IRValue, which you
// can get the LLVM native handle, by irParam.LLVM
// Also, you can get the type by irParam.Type, which
// is of type IRType, again, you can get LLVMTypeRef
// handle by irParam.Type.LLVM
// You can optionally construct LLVM instructions here.
printfn "irParam: %A" irParam
Some irParam
| _ -> None
[<Identity>]
let identity(x:'T) : 'T = failwith "this is device function, better not call it from host"

Related

Can asynchronous module definitions be used with abstract syntax trees on v8 engine to read third party dependencies? [duplicate]

I understand eval string-to-function is impossible to use on the browsers' application programming interfaces, but there must be another strategy to use third party dependencies without node.js on v8 engine, given Cloudflare does it in-house, unless they disable the exclusive method by necessity or otherwise on their edge servers for Workers. I imagine I could gather the AST of the commonjs module, as I was able to by rollup watch, but what might the actual steps be, by tooling? I mention AMD for it seems to rely on string-to-function (to-which I've notice Mozilla MDN says nothing much about it).
I have been exploring the require.js repositories, and they either use eval or AST
function DEFNODE(type, props, methods, base) {
if (arguments.length < 4) base = AST_Node;
if (!props) props = [];
else props = props.split(/\s+/);
var self_props = props;
if (base && base.PROPS) props = props.concat(base.PROPS);
var code = "return function AST_" + type + "(props){ if (props) { ";
for (var i = props.length; --i >= 0; ) {
code += "this." + props[i] + " = props." + props[i] + ";";
}
var proto = base && new base();
if ((proto && proto.initialize) || (methods && methods.initialize))
code += "this.initialize();";
code += "}}";
//constructor
var cnstor = new Function(code)();
if (proto) {
cnstor.prototype = proto;
cnstor.BASE = base;
}
if (base) base.SUBCLASSES.push(cnstor);
cnstor.prototype.CTOR = cnstor;
cnstor.PROPS = props || null;
cnstor.SELF_PROPS = self_props;
cnstor.SUBCLASSES = [];
if (type) {
cnstor.prototype.TYPE = cnstor.TYPE = type;
}
if (methods)
for (i in methods)
if (HOP(methods, i)) {
if (/^\$/.test(i)) {
cnstor[i.substr(1)] = methods[i];
} else {
cnstor.prototype[i] = methods[i];
}
}
//a function that returns an object with [name]:method
cnstor.DEFMETHOD = function (name, method) {
this.prototype[name] = method;
};
if (typeof exports !== "undefined") exports[`AST_${type}`] = cnstor;
return cnstor;
}
var AST_Token = DEFNODE(
"Token",
"type value line col pos endline endcol endpos nlb comments_before file raw",
{},
null
);
https://codesandbox.io/s/infallible-darwin-8jcl2k?file=/src/mastercard-backbank/uglify/index.js
https://www.youtube.com/watch?v=EF7UW9HxOe4
Is it possible to make a C++ addon just to add a default object for
node.js named exports or am I Y’ing up the wrong X
'.so' shared library for C++ dlopen/LoadLibrary (or #include?)
“I have to say that I'm amazed that there is code out there that loads one native addon from another native addon! Is it done by acquiring and then calling an instance of the require() function, or perhaps by using uv_dlopen() directly?”
N-API: An api for embedding Node in applications
"[there is no ]napi_env[ just yet]."
node-api: allow retrieval of add-on file name - Missing module in Init
Andreas Rossberg - is AST parsing, or initialize node.js abstraction for native c++, enough?
v8::String::NewFromUtf8(isolate, "Index from C++!");
Rising Stack - Node Source
"a macro implicit" parameter - bridge object between
C++ and JavaScript runtimes
extract a function's parameters and set the return value.
#include <nan.h>
int build () {
NAN_METHOD(Index) {
info.GetReturnValue().Set(
Nan::New("Index from C++!").ToLocalChecked()
);
}
}
// Module initialization logic
NAN_MODULE_INIT(Initialize) {
/*Export the `Index` function
(equivalent to `export function Index (...)` in JS)*/
NAN_EXPORT(target, Index);
}
New module "App" Initialize function from NAN_MODULE_INIT (an atomic?-macro)
"__napi_something doesn't exist."
"node-addon-API module for C++ code (N-API's C code's headers)"
NODE_MODULE(App, Initialize);
Sep 17, 2013, 4:42:17 AM to v8-u...#googlegroups.com "This comes up
frequently, but the answer remains the same: scrap the idea. ;)
Neither the V8 parser nor its AST are designed for external
interfacing. In particular (1) V8's AST does not necessarily reflect
JavaScript syntax 1-to-1, (2) we change it all the time, and (3) it
depends on various V8 internals. And since all these points are
important for V8, don't expect the situation to change.
/Andreas"
V8 c++: How to import module via code to script context (5/28/22, edit)
"The export keyword may only be used in a module interface unit.
The keyword is attached to a declaration of an entity, and causes that
declaration (and sometimes the definition) to become visible to module
importers[ - except for] the export keyword in the module-declaration, which is just a re-use of the keyword (and does not actually “export” ...entities)."
SyntheticModule::virtual
ScriptCompiler::CompileModule() - "Corresponds to the ParseModule abstract operation in the ECMAScript specification."
Local<Function> foo_func = ...;//external
Local<Module> module = Module::CreateSyntheticModule(
isolate, name,
{String::NewFromUtf8(isolate, "foo")},
[](Local<Context> context, Local<Module> module) {
module->SetSyntheticModuleExport(
String::NewFromUtf8(isolate, "foo"), foo_func
);
});
Context-Aware addons from node.js' commonjs modules
export module index;
export class Index {
public:
const char* app() {
return "done!";
}
};
import index;
import <iostream>;
int main() {
std::cout << Index().app() << '\n';
}
node-addon-api (new)
native abstractions (old)
"Thanks to the crazy changes in V8 (and some in Node core), keeping native addons compiling happily across versions, particularly 0.10 to 0.12 to 4.0, is a minor nightmare. The goal of this project is to store all logic necessary to develop native Node.js addons without having to inspect NODE_MODULE_VERSION and get yourself into a macro-tangle[ macro = extern atomics?]."
Scope Isolate (v8::Isolate), variable Local (v8::Local)
typed_array_to_native.cc
"require is part of the Asynchronous Module Definition AMD API[, without "string-to-function" eval/new Function()],"
node.js makes objects, for it is written in C++.
"According to the algorithm, before finding
./node_modules/_/index.js, it tried looking for express in the
core Node.js modules. This didn’t exist, so it looked in node_modules,
and found a directory called _. (If there was a
./node_modules/_.js, it would load that directly.) It then
loaded ./node_modules/_/package.json, and looked for an exports
field, but this didn’t exist. It also looked for a main field, but
this didn’t exist either. It then fell back to index.js, which it
found. ...require() looks for node_modules in all of the parent directories of the caller."
But java?
I won't accept this answer until it works, but this looks promising:
https://developer.oracle.com/databases/nashorn-javascript-part1.html
If not to run a jar file or something, in the Worker:
https://github.com/nodyn/jvm-npm
require and build equivalent in maven, first, use "dist/index.js".
Specifically: [ScriptEngineManager][21]
https://stackoverflow.com/a/15787930/11711280
Actually: js.commonjs-require experimental
https://docs.oracle.com/en/graalvm/enterprise/21/docs/reference-manual/js/Modules/
Alternatively/favorably: commonjs builder in C (v8 and node.js)
https://www.reddit.com/r/java/comments/u7elf4/what_are_your_thoughts_on_java_isolates_on_graalvm/
Here I will explore v8/node.js src .h and .cc for this purpose
https://codesandbox.io/s/infallible-darwin-8jcl2k?file=/src/c.cpp
I'm curious why there is near machine-level C operability in Workers if not to use std::ifstream, and/or build-locally, without node.js require.

How to find dma_request_chan() failure reason details?

In an external kernel module, using DMA Engine, when calling dma_request_chan() returns an error pointer of value -19, i.e. ENODEV or "No such device".
Now, in the active device tree, I do find a dma-names entry with what I'm trying to get a channel for, so my suspicion is that something else deeper in the forest is already not found.
How do I find out what's wrong?
Background:
I have a Zynq MP Ultrascale+ board here, with an FPGA design which uses AXI VDMA block to provide one channel of data to be received on the Cortex A's Linux, where the data is written to DDR4 by the FPGA and to be read from Linux.
I found that there is a Xilinx DMA driver included in the kernel, in the Xilinx source repo anyway, currently kernel version 5.6.0.
And that that driver has no user space interface, such that an intermediate kernel driver is needed.
This is depicted, and they have an example here: Section "4 DMA Proxy Design". I modified the code in the dma-proxy.c of the zip file linked there such that it uses only the RX channel, i.e. also only tries to request it.
The code for that is here, to not make this post huge:
Modified dma-proxy.c at onlinegdb.com
Line 407 has the function create_channel(), which used to use dma_request_slave_channel() which ditches the error code of the function it wraps, so to see the error, I am using that one instead: dma_request_chan().
The function create_channel() is called in function dma_proxy_probe() # line 470 (the occurences before that are deactivated by compile switch).
So by way of this call, dma_request_chan() will be called with the parameters:
create_channel(pdev, &channels[RX_CHANNEL], "dma_proxy_rx", DMA_DEV_TO_MEM);
The Device Tree for my board has an added node for dma-proxy driver as is shown at the top of the dma-proxy.c
dma_proxy {
compatible ="xlnx,dma_proxy";
dmas = <&axi_dma_0 0>;
dma-names = "dma_proxy_rx";
};
The name "axi_dma_0" matches with the name in the axi DMA device tree node:
axi_dma_0: dma#a0000000 {
#dma-cells = <0x1>;
clock-names = "s_axi_lite_aclk", "m_axi_s2mm_aclk";
clocks = <0x3 0x47 0x3 0x47>;
compatible = "xlnx,axi-dma-7.1", "xlnx,axi-dma-1.00.a";
interrupt-names = "s2mm_introut";
interrupt-parent = <0x1d>;
interrupts = <0x0 0x2>;
reg = <0x0 0xa0000000 0x0 0x1000>;
xlnx,addrwidth = <0x28>;
xlnx,sg-length-width = <0x1a>;
phandle = <0x1e>;
dma-channel#a0000030 {
compatible = "xlnx,axi-dma-s2mm-channel";
dma-channels = <0x1>;
interrupts = <0x0 0x2>;
xlnx,datawidth = <0x40>;
xlnx,device-id = <0x0>;
};
If I now look here:
% cat /proc/device-tree/dma_proxy/dma-names
dma_proxy_rx
Looks like my dma_proxy_rx, that I'm trying to request the channel for, is in there.
Edit:
In the boot log, I see this:
xilinx-vdma a0000000.dma: Please ensure that IP supports buffer length > 23 bits
irq: no irq domain found for interrupt-controller#a0010000 !
xilinx-vdma a0000000.dma: unable to request IRQ 0
xilinx-vdma a0000000.dma: WARN: Device release is not defined so it is not safe to unbind this driver while in use
xilinx-vdma a0000000.dma: Xilinx AXI DMA Engine Driver Probed!!
There are warnings - but in the end, the Xilinx AXI DMA Engine got "probed", meaning the lowest level driver loaded and is ready, right?
So it looks to me like there should be my device, but the kernel disagrees.
I've got the same problem with similar configuration. After digging a lot of kernel source code (especially drivers/dma/xilinx/xilinx_dma.c) I've solved this problem by changing channel number in dmas parameter from 0 to 1 in dma-proxy device tree entry like this:
dma_proxy {
compatible ="xlnx,dma_proxy";
dmas = <&axi_dma_0 1>;
dma-names = "dma_proxy_rx";
};
It seems that dma-proxy example is written for AXI DMA block with both mm2s (channel #0) and s2mm (channel #1) channels. And if we remove mm2s channel from AXI DMA block, the s2mm channel stays #1.

IONotificationPortCreate function call generates compiler error

I am having an issue with the IONotificationCreatePort function in IOKit:
var NotificationPort = IONotificationPortCreate(MasterPort)
IONotificationPortSetDispatchQueue(NotificationPort, DispatchQueue)
gives the following compiler error when NotificationPort is used in the function call in the second line
'Unmanaged IONotificationPort' is not identical to
'IONotificationPort'
if I use the following code based on the information in the Using Swift with Cocoa and Objective-C document, it compiles but generates a runtime error
var NotificationPort = IONotificationPortCreate(MasterPort).takeRetainedValue()
IONotificationPortSetDispatchQueue(NotificationPort, DispatchQueue)
Thread 1: EXC_BAD_ACCESS(code=1, address=0xwhatever)
So I think I have the run time error figured out, the IONotificationPort object does not have takeRetainedValue method
The crux of the problem as I see it, is that the IONotificationPortCreate function creates an IONotificationPort object and returns the reference to it.
I have looked all over the place and there is lots of information about and ways to pass references into a function call from Swift but nowhere can I find how to deal with references as a return value.
Can Swift call an object by reference?
Or am I way off the mark here????
Here is the objective C code that I am trying to convert to swift:
_notificationPort = IONotificationPortCreate(masterPort);
IONotificationPortSetDispatchQueue(_notificationPort, _controllerQueue);
Here is the complete code snippet from my swift file:
//Get IOKit Master Port
var MasterPort: mach_port_t = 0
let BootstrapPort: mach_port_t = 0
var MasterPortReturnCode: kern_return_t = 0
MasterPortReturnCode = IOMasterPort(BootstrapPort, &MasterPort)
println("Master port returned as \(MasterPort) with return code of \(MasterPortReturnCode)")
//Set up notification port and send queue
let DispatchQueue = dispatch_queue_create("com.apparata.AVB_Browser", DISPATCH_QUEUE_SERIAL)
var NotificationPort = IONotificationPortCreate(MasterPort)
IONotificationPortSetDispatchQueue(NotificationPort, DispatchQueue)

Is there a good openCL wrapper for Ruby?

I am aware of:
https://github.com/lsegal/barracuda
Which hasn't been updated since 01/11
And
http://rubyforge.org/projects/ruby-opencl/
Which hasn't been updated since 03/10.
Are these projects dead? Or have they simply not changed because their functioning, and OpenCL/Ruby haven't changed since then. Is anybody using these projects? Any luck?
If not, can you recommend another opencl gem for Ruby? Or how is this sort of call done usually? Just call raw C from Ruby?
You can try opencl_ruby_ffi, it's actively developed (by a colleague of mine) and working well with OpenCL version 1.2. OpenCL 2.0 should also be available soon.
sudo gem install opencl_ruby_ffi
In Khronos forum you can find a quick example that shows how it works:
require 'opencl_ruby_ffi'
# select the first platform/device available
# improve it if you have multiple GPU on your machine
platform = OpenCL::platforms.first
device = platform.devices.first
# prepare the source of GPU kernel
# this is not Ruby but OpenCL C
source = <<EOF
__kernel void addition( float2 alpha, __global const float *x, __global float *y) {\n\
size_t ig = get_global_id(0);\n\
y[ig] = (alpha.s0 + alpha.s1 + x[ig])*0.3333333333333333333f;\n\
}
EOF
# configure OpenCL environment, refer to OCL API if necessary
context = OpenCL::create_context(device)
queue = context.create_command_queue(device, :properties => OpenCL::CommandQueue::PROFILING_ENABLE)
# create and compile the OpenCL C source code
prog = context.create_program_with_source(source)
prog.build
# allocate CPU (=RAM) buffers and
# fill the input one with random values
a_in = NArray.sfloat(65536).random(1.0)
a_out = NArray.sfloat(65536)
# allocate GPU buffers matching the CPU ones
b_in = context.create_buffer(a_in.size * a_in.element_size, :flags => OpenCL::Mem::COPY_HOST_PTR, :host_ptr => a_in)
b_out = context.create_buffer(a_out.size * a_out.element_size)
# create a constant pair of float
f = OpenCL::Float2::new(3.0,2.0)
# trigger the execution of kernel 'addition' on 128 cores
event = prog.addition(queue, [65536], f, b_in, b_out,
:local_work_size => [128])
# #Or if you want to be more OpenCL like:
# k = prog.create_kernel("addition")
# k.set_arg(0, f)
# k.set_arg(1, b_in)
# k.set_arg(2, b_out)
# event = queue.enqueue_NDrange_kernel(k, [65536],:local_work_size => [128])
# tell OCL to transfer the content GPU buffer b_out
# to the CPU memory (a_out), but only after `event` (= kernel execution)
# has completed
queue.enqueue_read_buffer(b_out, a_out, :event_wait_list => [event])
# wait for everything in the command queue to finish
queue.finish
# now a_out contains the result of the addition performed on the GPU
# add some cleanup here ...
# verify that the computation went well
diff = (a_in - a_out*3.0)
65536.times { |i|
raise "Computation error #{i} : #{diff[i]+f.s0+f.s1}" if (diff[i]+f.s0+f.s1).abs > 0.00001
}
puts "Success!"
You may want to package whatever C functionality you would like as a gem. This is pretty straightforward and this way you can wrap all your c logic in a specific namespace that you can reuse in other projects.
http://guides.rubygems.org/c-extensions/
If you want to do high speed calculations with GPU, Cumo / NArray is a good choice. Cumo has the same interface as NArray. Although it is cuda rather than opencl.
https://github.com/sonots/cumo

debugging core files

I want to write a program which can read core files in Linux. However i cannot find any documentation which can guide me in this respect. Can someone please guide me as to where to do find some resources?
You can also take a look at GDB source code, gdb/core*.
For instance, in gdb/corelow.c, you can read at the end:
static struct target_ops core_ops;
core_ops.to_shortname = "core";
core_ops.to_longname = "Local core dump file";
core_ops.to_doc = "Use a core file as a target. Specify the filename of the core file.";
core_ops.to_open = core_open;
core_ops.to_close = core_close;
core_ops.to_attach = find_default_attach;
core_ops.to_detach = core_detach;
core_ops.to_fetch_registers = get_core_registers;
core_ops.to_xfer_partial = core_xfer_partial;
core_ops.to_files_info = core_files_info;
core_ops.to_insert_breakpoint = ignore;
core_ops.to_remove_breakpoint = ignore;
core_ops.to_create_inferior = find_default_create_inferior;
core_ops.to_thread_alive = core_thread_alive;
core_ops.to_read_description = core_read_description;
core_ops.to_pid_to_str = core_pid_to_str;
core_ops.to_stratum = process_stratum;
core_ops.to_has_memory = core_has_memory;
core_ops.to_has_stack = core_has_stack;
core_ops.to_has_registers = core_has_registers;
The struct target_ops defines a generic interface that the upper part of GDB will use to communicate with a target. This target can be a local unix process, a remote process, a core file, ...
So if you only investigate what's behing these functions, you won't be overhelmed by the generic part of the debugger implementation.
(depending of what's your final goal, you may also want to reuse this interface and its implementation in your app, it shouldn't rely on so many other things.
Having a look at the source of gcore http://people.redhat.com/anderson/extensions/gcore.c might be helpful.
Core files can be examined by using the dbx(1) or mdb(1) or one of the proc(1) tools.

Resources