Issues with parallel Julia code on windows cluster - windows

I'm setting up a small windows cluster for parallel speedup of my Julia code (2x32 cores).
I have following questions:
Is there a way to suppress loading of a module (e.g. "using PyPlot") on the remote machines? In my code, I use my workstation for initialization and data presentation, whereas the cluster is used for heavy calculation without any need for PyPlot, Dataframes etc.
This code loading on the remote machines is even more annoying as the PyPlot (and any other package) fails to populate help database by giving following error message: (actually a lot of errors from every worker)
exception on : 1: 1ERROR: opening file C:\Users\phlavenk\AppData\Local\Julia-0.3.6\bin/../share/julia\helpdb.jl: No such file or directory
Running on Julia 3.6/ x64 / Windows7, identical directory structure and versions everywhere.
My addprocs command is following:
addprocs(machines,
sshflags=`-i c:\\cygwin64\\home\\phlavenk\\.ssh\\id_rsa`,
dir=`/cygdrive/c/Users/phlavenk/AppData/Local/Julia-0.3.6/bin`,
tunnel=true)
Thank you very much for your advice

"using" causes a module to be loaded on all the processes. To load a module on a specific machine you use "include". e.g.
if myid()==1
include("/home/user/.julia/PyPlot/src/PyPlot.jl")
end
You can then do your plotting by
PyPlot.plot(...) on your local machine.

You could sequence the statements in this order:
using PyPlot
using ModuleNeededOnMasterProcessOnly
addprocs(...)
using ModuleNeededOnAllProcesses

Related

Can not load or initialize mscordaccore.dll when analyzing a core dump with dotnet-dump analyze

I am trying to analyze a core dump using dotnet-dump tool via cmd:
tmp>dotnet-dump analyze core.2293
Loading core dump: core.2293 ...
Ready to process analysis commands. Type 'help' to list available commands or 'help [command]' to get detailed help on a command.
Type 'quit' or 'exit' to exit the session.
As documentation tells it brings up an interactive session that accepts a variety of instructions to get debug info.
In my case, every command fails with the message like this:
> pe -lines
Failed to load data access module, 0x80004002
Can not load or initialize mscordaccore.dll. The target runtime may not be initialized.
For more information see https://go.microsoft.com/fwlink/?linkid=2135652
>
p/s Link above doesn't help much.
Do you have any suggestions on how to fix it?
Solved. The limitation is that process dumps are not portable. It is not possible to diagnose dumps collected on Linux with Windows and vice-versa.
The dump was collected on a Linux machine. And I were trying to analyze it on a Windows machine.
To analyze it properly you should set up a Linux environment. In my case, it was done by creating a docker container with an sdk:alpine image.

Julia invoke script on existing REPL from command line

I want to run a Julia script from window command line, but it seems everytime I run > Julia code.jl, a new instance of Julia is created and the initiation time (package loading, compiling?) is quite long.
Is there a way for me to skip this initiation time by running the script on the current REPL/Julia instance? (which usually saves me 50% of running time).
I am using Julia 1.0.
Thank you,
You can use include:
julia> include("code.jl")
There are several possible solutions. All of them involve different ways of sending commands to a running Julia session. The first few that come to my mind are:
use sockets as explained in https://docs.julialang.org/en/v1/manual/networking-and-streams/#A-simple-TCP-example-1
set up a HTTP server e.g. using https://github.com/JuliaWeb/HTTP.jl
use named pipes, as explained in Named pipe does not wait until completion in bash
communicate e.g. through the file system (e.g. make Julia scan some folder for .jl files and if it finds them there they get executed and moved to another folder or deleted) - this is probably simplest to implement correctly
In all the solutions you can send the command to Julia by executing some shell command.
No matter which approach you prefer the key challenge is sanitizing the code to handle errors properly (i.e. a situation when you sent some command to the Julia session and it crashes or when you send requests faster than Julia is able to handle them). This is especially important if you want the Julia server to be detached from the terminal.
As a side note: when using the Distributed module from stdlib in Julia for multiprocessing you actually do a very similar thing (but the communication is Julia to Julia) so you can also have a look how this module is implemented to get the feeling how it can be done.

what's the difference between spark-shell and submitted sbt programs

Spark-shell can be used to interact with the distributed storage of data, then what is the essential difference between coding in spark-shell and uploading packaged sbt independent applications to the cluster operation?(I found a difference is sbt submit the job can be seen in the cluster management interface, and the shell can not) After all, sbt is very troublesome, and the shell is very convenient.
Thanks a lot!
Spark-shell gives you a bare console-like interface in which you can run your codes like individual commands. This can be very useful if you're still experimenting with the packages or debugging your code.
I found a difference is sbt submit the job can be seen in the cluster management interface, and the shell can not
Actually, spark shell also comes up in the job UI as "Spark-Shell" itself and you can monitor the jobs you are running through that.
Building spark applications using SBT gives you some organization in your development process, iterative compilation which is helpful in day-to-day development, and a lot of manual work can be avoided by this. If you have a constant set of things that you always run, you can simply run the same package again instead of going through the trouble of running the entire thing like commands. SBT does take some time getting used to if you are new to java style of development, but it can help maintain applications on the long run.

how to parallelize "make" command which can distribute task on multiple machine

I been compiling a ".c / .c++" code which takes 1.5hour to compile on 4 core machine using "make" command.I also have 10 more machine which i can use for compiling. I know "-j" option in "make" which distribute compilation in specified number of threads. but "-j " option distribute threads only on current machine not on other 10 machine which are connected in network.
we can use MPI or other parallel programing technique but we need to rewrite "MAKE" command implementation according to parallel programing language.
Is there is any other way by which we can make use of other available machine for compilation???
thanks
Yes, there is: distcc.
distcc is a program to distribute compilation of C or C++ code across
several machines on a network. distcc should always generate the same
results as a local compile, is simple to install and use, and is often
two or more times faster than a local compile.
Unlike other distributed build systems, distcc does not require all
machines to share a filesystem, have synchronized clocks, or to have
the same libraries or header files installed. Machines can be running
different operating systems, as long as they have compatible binary
formats or cross-compilers.
By default, distcc sends the complete preprocessed source code across
the network for each job, so all it requires of the volunteer machines
is that they be running the distccd daemon, and that they have an
appropriate compiler installed.
They key is that you still keep your single make, but gcc the arranges files appropriately (running preprocessor, headers, ... locally) but arranges for the compilation to object code over the network.
I have used it in the past, and it is pretty easy to setup -- and helps in exactly your situation.
https://github.com/icecc/icecream
Icecream was created by SUSE based on distcc. Like distcc, Icecream takes compile jobs from a build and distributes it among remote machines allowing a parallel build. But unlike distcc, Icecream uses a central server that dynamically schedules the compile jobs to the fastest free server. This advantage pays off mostly for shared computers, if you're the only user on x machines, you have full control over them.

Stale NFS file handle issue on a remote cluster

I need to run a bunch of simulations using a tool called ngspice, and since I want to run a million simulations, I am distributing them across a cluster of machines (master+ a slave to start with, which have 12 cores each).
This is the command:
ngspice deck_1.sp; ngspice deck_2.sp etc.,
Step 1: A python script is used to generate these sp files.
Step 2: Python invokes GNU parallel to distribute the sp files across the master/slave and run the simulations using ngspice
Step 3: I post-process the results (python script).
I generate and process only 1000 files at a time to save disk space. So the above Step 1 to 3 are repeated in a loop till a million files are simulated.
Now, my problem is:
When I execute the loop for the 1st time, I have no problem. The files are distributed across the master/slave till the 1000 simulations are complete. When the loop starts off the second time, I clear off the existing sp files and regenerate them (step 1). Now, when I execute step 2- for some strange reason, some files are not being detected. After some debugging, the error I get is- "Stale NFS file handle" and "No such file or directory deck_21.sp" etc., for certain sp files that are created in step 1.
I paused my python script and did an 'ls' in the directory and I see that the files actually exist, but like the error points out, it is because of the Stale NFS file handle. This link recommends that I remount the client etc., but I am logged into a machine to which I have no admin privileges to mount.
Is there a way I can resolve this?
Thanks!
No. You need admin prviledges to fix this.

Resources