Running julia code on multiple machines - parallel-processing

I have parallelized my algorithm using pmap. The performance improvement on one machine using the -p option is great. Now I would like to run on multiple machines.
I used the --machinefile option on julia start. It works but it launches only one process on remote machine. I would like to have multiple processes running on each machine. Option -p enables multiple processes only on the local machine. Is there a way to specify number of processes on remote machines?

On Julia 0.3 you have to list the remote machines multiple times to open multiple Julia copies.
On Julia 0.4 (unreleased) you can actually put a count next to each address, see this pull request.

Related

Jenkins jobs slow. Is it I/O related?

We have several jenkins pipeline jobs that are taking much longer to complete than we expect. Specific steps seem to "hang" for unwarranted periods of time. Running those same steps manually on another system runs significantly faster.
One example job is a step that uses Ruby to recurse through a bunch of directories and performs a shell command on each file in those directories. Running on our Ubuntu 14.04 Jenkins system takes about 50 minutes. Running the same command on my desktop Mac runs in about 10 seconds.
I did some experimentation on the Jenkins builder by running the Ruby command at the command prompt and had the same slow result that Jenkins had. I also removed Ruby from the equation by batching up each of the individual shell commands Ruby would have run and put them in a shell script to run each shell command sequentially. That took a long time as well.
I've read some posts about STDERR blocking may be the reason. I've then done some experimentation with redirecting STDERR and STDOUT to /dev/null and the commands will finish in about 20 seconds. That is what I would expect.
My questions are:
1. Would these slowdowns in execution time be the result of some I/O blocking?
2. What is the best way to fix this? Some cases I may want the output so redirecting to /dev/null is probably not going to work. Is there a kernel or OS level change I can make?
Running on Ubuntu 14.04 Amazon EC2 instance R3.Large.
Ubuntu 14.04.5 LTS (GNU/Linux 3.13.0-108-generic x86_64)
ruby 2.1.5p273 (2014-11-13 revision 48405) [x86_64-linux]
Yes. Transferring huge amounts of data between slave and master leads to performance problems indeed. This applies for storing build artifacts as well as for massive amounts of console output.
For console output, the performance penalty is particularly big if you use the timestamper plugin. If enabled for your job, try disabling that first.
Otherwise, I'd avoid huge amounts of console output in general. Try to restrict console output to very high job-level information that (in case of failure) provides links to further "secondary" logfile data.
Using I/O redirection (as you already did) is the proper way to accomplish that, e.g.
mycommand 2>mycommand.stderr.txt 1>mycommand.stdout.txt
This will always work (except for very special cases where you may need to select command-specific options to redirect a console output stream that a command explicitly creates on its own).

how to parallelize "make" command which can distribute task on multiple machine

I been compiling a ".c / .c++" code which takes 1.5hour to compile on 4 core machine using "make" command.I also have 10 more machine which i can use for compiling. I know "-j" option in "make" which distribute compilation in specified number of threads. but "-j " option distribute threads only on current machine not on other 10 machine which are connected in network.
we can use MPI or other parallel programing technique but we need to rewrite "MAKE" command implementation according to parallel programing language.
Is there is any other way by which we can make use of other available machine for compilation???
thanks
Yes, there is: distcc.
distcc is a program to distribute compilation of C or C++ code across
several machines on a network. distcc should always generate the same
results as a local compile, is simple to install and use, and is often
two or more times faster than a local compile.
Unlike other distributed build systems, distcc does not require all
machines to share a filesystem, have synchronized clocks, or to have
the same libraries or header files installed. Machines can be running
different operating systems, as long as they have compatible binary
formats or cross-compilers.
By default, distcc sends the complete preprocessed source code across
the network for each job, so all it requires of the volunteer machines
is that they be running the distccd daemon, and that they have an
appropriate compiler installed.
They key is that you still keep your single make, but gcc the arranges files appropriately (running preprocessor, headers, ... locally) but arranges for the compilation to object code over the network.
I have used it in the past, and it is pretty easy to setup -- and helps in exactly your situation.
https://github.com/icecc/icecream
Icecream was created by SUSE based on distcc. Like distcc, Icecream takes compile jobs from a build and distributes it among remote machines allowing a parallel build. But unlike distcc, Icecream uses a central server that dynamically schedules the compile jobs to the fastest free server. This advantage pays off mostly for shared computers, if you're the only user on x machines, you have full control over them.

Does Docker give RAM extra mileage?

According | to | countless | sources, Docker provides ultra-lightweight virtualization by sharing system resources across containers, instead of allocating copies of those resources per container.
I've even read articles where it is boasted that you could "run dozens, even hundreds of containers on the same VM."
But if my app requires 2GB RAM to run, and the underlying physical machine has only 8GB RAM on it, I would normally only be able to run 3 instances of my app on it (leaving ~2GB for system memory, utilities, etc.).
Does Docker do some kind of magic with RAM, allowing me to actually run dozens of containers, each one allocated 2GB RAM, but somehow sharing unused memory under the hood?
Or are those statements more media hype than anything else?
When people talk about running "dozens or hundreds of containers" they are normally thinking about microservices; small applications that do a specific task. Each of these may have memory usage measured in KBs rather than MBs, and probably not GBs, and as such there is no reason a decent machine couldn't run dozens or hundreds of them.
There is actually a competition (I think it's on-going) to get as many containers as possible running on a Raspberry Pi. The result currently stands at over a thousand, but admittedly these containers won't be running a real-life application.
Regarding memory, the answer is "it's complicated". If you're using the AUFS or Overlay driver, containers with the same base image should be able to share "memory pages"; meaning shared libraries shouldn't need to get loaded twice for two containers. This isn't something special though; normal processes running on the host will work the same way.
At the end of the day, containers are little more than isolated processes. We can easily run dozens or hundreds of processes on a host, so it's not unfeasible to run dozens or hundreds of containers.
A Docker container only consumes the resources that it needs as it needs them. So yes you could literally run hundreds of machines on one box as long as they are not all actively consuming your resources. That is what makes Docker unique; the fact that a container will use what resources it can and then release them making them available for another container on the same host. It is best practice to let the container and Docker handle allocating resources instead of doing a hard assign of them.
The alternative would be a virtual machine. Each virtual machine that you run has to run a full linux kernal, and the host OS will hold a chunk of memory aside for the virtualized environment. This means that you can really only run a couple VMs on all but the heaviest duty hardware.
A container does NOT run a kernel- it just runs a single process (plus sub processes). This means that you can run as many processes in containers as you could if you were running those same processes without containers- each thinks it is running on a separate machine, but they all just show up as processes on the host kernel.
There is no magic that will make you able to use RAM dozens of times over. But you can pack smaller processes in together a LOT tighter than you could using virtual machines for seperation.

Merge two *.jtl file of test report running on different machines

How can i merge reports of same script running on different machines using j meter.
i Avoid remote testing but having issue in get combined result at one place while script running from all machines.
Use a decent merge program like Beyond Compare
Write a merge script
Use remote testing as recommended

Stale NFS file handle issue on a remote cluster

I need to run a bunch of simulations using a tool called ngspice, and since I want to run a million simulations, I am distributing them across a cluster of machines (master+ a slave to start with, which have 12 cores each).
This is the command:
ngspice deck_1.sp; ngspice deck_2.sp etc.,
Step 1: A python script is used to generate these sp files.
Step 2: Python invokes GNU parallel to distribute the sp files across the master/slave and run the simulations using ngspice
Step 3: I post-process the results (python script).
I generate and process only 1000 files at a time to save disk space. So the above Step 1 to 3 are repeated in a loop till a million files are simulated.
Now, my problem is:
When I execute the loop for the 1st time, I have no problem. The files are distributed across the master/slave till the 1000 simulations are complete. When the loop starts off the second time, I clear off the existing sp files and regenerate them (step 1). Now, when I execute step 2- for some strange reason, some files are not being detected. After some debugging, the error I get is- "Stale NFS file handle" and "No such file or directory deck_21.sp" etc., for certain sp files that are created in step 1.
I paused my python script and did an 'ls' in the directory and I see that the files actually exist, but like the error points out, it is because of the Stale NFS file handle. This link recommends that I remount the client etc., but I am logged into a machine to which I have no admin privileges to mount.
Is there a way I can resolve this?
Thanks!
No. You need admin prviledges to fix this.

Resources