How can I specify a minimum compute capability to the mexcuda compiler to compile a mexfunction? - compilation

I have a CUDA project in a .cu file that I would like to compile to a .mex file using mexcuda. Because my code makes use of the 64-bit floating point atomic operation atomicAdd(double *, double), which is only supposed for GPU devices of compute capability 6.0 or higher, I need to specify this as a flag when I am compiling.
In my standard IDE, this works fine, but when compiling with mexcuda, this is not working as I would like. In this post on MathWorks, it was suggested to use the following command (edited from the comment by Joss Knight):
mexcuda('-v', 'mexGPUExample.cu', 'NVCCFLAGS=-gencode=arch=compute_60,code=sm_60')
but when I use this command on my file, the verbose option spits out the following line last:
Building with 'NVIDIA CUDA Compiler'.
nvcc -c --compiler-options=/Zp8,/GR,/W3,/EHs,/nologo,/MD -
gencode=arch=compute_30,code=sm_30 -gencode=arch=compute_50,code=sm_50 -
gencode=arch=compute_60,code=sm_60 -
gencode=arch=compute_70,code=\"sm_70,compute_70\"
(and so on), which signals to me that the specified flag was not passed to the nvcc properly. And indeed, compilation fails with the following error:
C:/path/mexGPUExample.cu(35): error: no instance of overloaded function "atomicAdd" matches
the argument list. Argument types are: (double *, double)
The only other post I could find on this topic was this post on SO, but it is almost three years old and seemed to me more like a workaround - one which I do not understand even after some research, otherwise I would have tried it - rather than a true solution to the problem.
Is there a setting I missed, or can this simply not be done without a workaround?

I was able to work my way around this problem after some messing around with the standard xml-files in the MatLab folder. The following steps allowed me to compile using -mexcuda:
-1) Go to the folder C:\Program Files\MATLAB\-version-\toolbox\distcomp\gpu\extern\src\mex\win64, which contains xml-files for different versions of msvcpp;
-2) Make a backup of the file that corresponds to the version you are using. In my case, I made a copy of the file nvcc_msvcpp2017 and named it nvcc_msvcpp2017_old, to always have the original.
-3) Open nvcc_msvcppYEAR with notepad, and scroll to the following block of lines:
COMPILER="nvcc"
COMPFLAGS="--compiler-options=/Zp8,/GR,/W3,/EHs,/nologo,/MD $ARCHFLAGS"
ARCHFLAGS="-gencode=arch=compute_30,code=sm_30 -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_70,code=\"sm_70,compute_70\" $NVCC_FLAGS"
COMPDEFINES="--compiler-options=/D_CRT_SECURE_NO_DEPRECATE,/D_SCL_SECURE_NO_DEPRECATE,/D_SECURE_SCL=0,$MATLABMEX"
MATLABMEX="/DMATLAB_MEX_FILE"
OPTIMFLAGS="--compiler-options=/O2,/Oy-,/DNDEBUG"
INCLUDE="-I"$MATLABROOT\extern\include" -I"$MATLABROOT\simulink\include""
DEBUGFLAGS="--compiler-options=/Z7"
-4) Remove the architectures that will not allow your code to compile, i.e. all the architecture flags below 60 in my case:
ARCHFLAGS="-gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_70,code=\"sm_70,compute_70\" $NVCC_FLAGS"
-5) I was able to compile using mexcuda after this. You do not need to specify any architecture flags in the mexcuda call.
-6) (optional) I suppose you want to revert this change after you are done with the project that required you to make this change, if you want to ensure maximum portability of the code you will compile after this.
Note: you will need administrator permission to make these changes.

Related

V8 : Isolate is incompatible with the embedded blob

I am trying to create custom snapshot from some Javascript file. I was able to create a snapshot using the command
mksnapshot.exe snapshot11.js --startup_blob snap.bin
but when I was trying to create an Isolate with this snap.bin file I got this message
The Isolate is incompatible with the embedded blob. This is usually caused by incorrect usage of mksnapshot. When generating custom snapshots, embedders must ensure they pass the same flags as during the V8 build process (e.g.: --turbo-instruction-scheduling).
I am guessing that I need recreate the snapshot with the proper flags but I couldn't find which flags I need to use.
My args.gn
is_component_build=true
v8_static_library=false
is_official_build=false
is_debug=true
use_custom_libcxx=false
use_custom_libcxx_for_host=false
target_cpu="x64"
use_goma=false
v8_use_external_startup_data=false
v8_enable_i18n_support = false
symbol_level=2
v8_enable_fast_mksnapshot=true
Any lead will be helpful.
10x
You can invoke ninja with -v to have it print all the commands it executes; e.g. if you compile V8 with:
ninja -v -C out/... v8_monolith
then you'll find a line for the mksnapshot invocation in the output, and can copy the flags from there. (If you have already compiled V8, ninja will say "nothing to do"; in that case you can either clean out everything, or just delete snapshot_blob.bin and libv8_monolith.so.)

GnuCOBOL entry point not found

I've installed GnuCOBOL 2.2 on my Ubuntu 17.04 system. I've written a basic hello world program to test the compiler.
1 IDENTIFICATION DIVISION.
2 PROGRAM-ID. HELLO-WORLD.
3 *---------------------------
4 DATA DIVISION.
5 *---------------------------
6 PROCEDURE DIVISION.
7 DISPLAY 'Hello, world!'.
8 STOP RUN.
This program is entitled HelloWorld.cbl. When I compile the program with the command
cobc HelloWorld.cbl
HelloWorld.so is produced. When I attempt to run the compiled program using
cobcrun HelloWorld
I receive the following error:
libcob: entry point 'HelloWorld' not found
Can anyone explain to me what an entry point is in GnuCOBOL, and perhaps suggest a way to fix the problem and successfully execute this COBOL program?
According to the official manual of GNUCOBOL, you should compile your code with:
cobc -x HelloWorld.cbl
then run it with
./HelloWorld
You can also read GNUCOBOL wiki page which contains some exmaples for further information.
P.S. As Simon Sobisch said, If you change your file name to HELLO-WORLD.cbl to match the program ID, the same commands that you have used will be ok:
cobc HELLO-WORLD.cbl
cobcrun HELLO-WORLD
Can anyone explain to me what an entry point is in GnuCOBOL, and perhaps suggest a way to fix the problem and successfully execute this COBOL program?
An entry point is a point where you may enter a shared object (this is actually more C then COBOL).
GnuCOBOL generates entry points for each PROGRAM-ID, FUNCTION-ID and ENTRY. Therefore your entry point is HELLO-WORLD (which likely gets a conversion as - is no valid identifier in ANSI C - you won't have to think about this when CALLing a program as the conversion will be done internal).
Using cobcrun internally does:
search for a shared object (in your case HelloWord), as this is found (because you've generated it) it will be loaded
search for an entry point in all loaded modules - which isn't found
There are three possible options to get this working:
As mentioned in Ho1's answer: use cobc -x, the reason that this works is because you don't generate a shared object at all but a C main which is called directly (= the entry point doesn't apply at all)
preload the shared object and calling the program by its PROGRAM-ID (entry point), either manually with COB_PRE_LOAD=HelloWorld cobcrun HELLO-WORLD or through cobcrun (option available since GnuCOBOL 2.x) cobcrun -M HelloWorld HELLO-WORLD
change the PROGRAM-ID to match the source name (either rename or change the source, I'd do the second: PROGRAM-ID. HelloWorld.)

F# Microsoft.ParallelArrays not defined

So I downloaded and installed Microsoft Accelerator v2 to use ParallelArrays. I have referenced it in my project but when I try and execute the code from the module in a script file I get:
"The namespace 'ParallelArrays' is not defined
I have followed the instructions on this post:
Microsoft Accelerator library with Visual Studio F#
I've added a reference to the managed version "Microsoft.Accelerator.dll" to my F# project and then added the native "Accelerator.dll" as an item in my solution and set it's 'Copy To Output Directory' to Copy Always.
Still getting the FSI error and inline error in my script file on the '#load ...' line, however the solution builds fine, and no error in the module file.
Any ideas on what I'm missing? I'm sure it's something stupid.
Thanks,
Justin
UPDATE
I tried mydogisbox's advice, which got rid of the error above, but now when I run the code in the .fsx file I get this error instead:
--> Referenced 'F:\Work\GitHub\qf-sharp\qf-sharp\bin\Debug\Microsoft.Accelerator.dll' (file may be locked by F# Interactive process)
[Loading F:\Work\GitHub\qf-sharp\qf-sharp\MonteCarloGPU.fs]
error FS0192: internal error: F:\Work\GitHub\qf-sharp\qf-sharp\Accelerator.dll: bad cli header, rva 0
UPDATE 2
So the bad header error has dissapeared, but now I get this instead:
Microsoft.ParallelArrays.AcceleratorException: Failure to create a DirectX 9 device.
at Microsoft.ParallelArrays.ParallelArrays.ThrowNativeAcceleratorException()
at Microsoft.ParallelArrays.DX9Target..ctor()
at <StartupCode$FSI_0002>.$FSI_0002_MonteCarloGPU.main#() in F:\Work\GitHub\qf- sharp\qf-sharp\MonteCarloGPU.fs:line 14
Stopped due to error
I found this thread on MSDN however the answers proposed as fixes on that thread barely even relate to the question.
http://social.msdn.microsoft.com/Forums/vstudio/en-US/98600646-0345-4f62-a6c5-f03ac9c77179/ms-accelerator?forum=csharpgeneral
My Direct X version is 11, and I imagine that will suffice, however I tried installing DX9 however, it tells me that a newer version is detected therefore cant install.
There are special directives for referencing dlls from fsi. The #load directive loads the .fs file only. You need to use the #r directive to reference the file. You can either use the full path of the file or you can use #I to include the path to the file. More details here. Keep in mind that fsi is completely independent of your project, so all references in your project must be duplicated in fsi for it to access the same types.

erlang debug_info option - inside module or during compilation?

After overcoming some troubles with installation I tried to use erlang debugger on simple module:
I included -compile([debug_info]). option in source file and compiled with:
1> c(test_module).
This did not work as expected: After running
2> debugger:start().
the monitor window appeared, then I clicked
Module->Interpret...->test_module.erl
and got error
"Error when interpreting: test_module.erl: No debug_info in BEAM file".
Deleting -compile([debug_info]). line and changing
1> c(test_module).
to
1> c(test_module, [debug_info]).
solved the problem.
What is the difference between these two ways of setting compilation option, why one works and the other does not?
According to the docs, the two ways should be equivalent:
Note that all the options except the include path ({i,Dir}) can also
be given in the file with a -compile([Option,...]). attribute.
(From Erlang -- compile doc)
Check this question for more info.

Resurrecting old PLT-Scheme project (pre-1999)

I'm trying to resurrect an old (1999 or earlier) project written in Scheme (PLT-Scheme, using the mzscheme interpreter (?) commandline tool). To make the matters worse, I don't know Scheme, or Lisp (in fact, I want to learn, but that's another story).
I have the source code of the project at:
github.com/akavel/sherman
Now, when running the code, it bails out with an error message like below:
Sherman runtime version 0.5
Hosted on MzScheme version 52, Copyright (c) 1995-98 PLT (Matthew Flatt)
reference to undefined identifier: list->block
(I've tried PLT-Scheme versions 52, 53, 103, 103p1. Earlier versions don't allow mzscheme -L option, which is referenced in the sherman.bat script used in the project. Later versions also have some more serious problems with the code or options.)
The difficulty is, that from what I see, list->block actually is defined - see: collects/sherman/BLOCK.SS line 48. So, what is wrong?
To run the code, I perform the following steps:
Download PLT-Scheme v. 103p1 (from the old versions download page - first closing the "PLT Scheme is now Racket" banner) - for Windows, use: mz-103p1-bin-i386-win32.zip.
Unzip (e.g. to directory c:\PLT).
Copy c:\sherman\collects\sherman directory with contents to: c:\PLT\collects\sherman (where c:\sherman contains the contents of the github repository).
Run cmd.exe, then cd c:\sherman.
set PATH=c:\PLT;%PATH%
sherman.bat run trivial.s
this command is in fact, from what I understand, equivalent to:
(require-library "runtime.ss" "sherman")
(parameterize ((current-namespace sherman-namespace)) (load "trivial.s"))
(current-namespace sherman-namespace)
After that, I get the error as described above (MzScheme version would be reported as 103p1 or whatever).
Could you help me solve the problem?
EDIT 2: SOLVED!
To whom it may concern, I've added a fully fledged "How to use this project" instruction on the project page, detailing the solution to the problem thanks to soegaard's help.
In short:
copy trivial.s trivial.rs
rem (the above is workaround for problems with 'r2s.exe < trivial.r > trivial.rs')
sherman.bat compile trivial.rs
sherman.bat run trivial.zo
rem (or: sherman.bat run trivial.ss)
Not an answer, but a few notes too big for a comment.
1. Sanity Check
The error message says list->block is undefined.
Make sure that the code in block.ss is run, by
inserting (display "block.ss is loaded!") in block.ss
just to make sure, the code is run.
2. Random Thoughts
The file blocks.ss begins with:
(require-library "functios.ss")
(require-library "synrule.ss")
(require-library "stream.ss" "sherman")
The file "sherman/stream.ss" is in the repository,
but where is "synrule.ss" and "functios.ss" ?
Ah... This code is old! Here is a description of
how require-library worked. It lists functios.ss
and synrule.ss as part of MzLib.
http://www.informatik.uni-kiel.de/~scheme/doc/mzscheme/node158.htm
Let's check out how require-library worked:
When require-library is used to load a file, the library name and the
resulting value(s) are recored in a table associated with the current
namespace. If require-library is evaluated for a library that is
already registered in the current namespace's load table, then the
library is not loaded again; the result(s) recorded in the load table
is returned, instead.
So when the code in block.ss is run, the names are stored in a namespace. If the current namespace is the wrong one, when the code in block.ss is evaluated, it would explain you error message of list->block being undefined.

Resources