How to run scripts in parallel on Windows - shell

On linux or Mac, regularly run several python scripts (or other programs, for that matter) in parallel. A use case could be that the script runs a simulation based on random numbers, and I just want to run it many times to get good statistics.
An easy way of doing this on linux or Mac would be to use a for loop, and use an ampersand & to make the jobs run in parallel:
for i in {1..10}; do python script.py & ; done
Another usecase would be that I want to run a script on some stored data in a file. Say I have a bunch of .npy files with stored data, and I want to process them all with the same script, running 4 jobs in parallel (since I have a 4-core CPU), I could use xargs:
ls *.npy | xargs -P4 -n1 python script.py
Are there equivalent ways of doing this on the Windows command line?

Related

How to assign several cores/threads/tasks to an application that runs in parallel and is going to be run from the command line on MacOS laptop?

I have an application that can run commands in parallel. This application can work on a cluster using SLURM to get the resources, then internally, I assign each of the tasks I require to be performed by a different CPU/worker. Now, I want to run this application on my laptop (macOS) through the command line, the same code (except for the SLURM part) works fine with the only difference being that it is only performing one task at a time.
I have run code in parallel in MATLAB using the commands parcluster, parfor, etc. In this code, I can get up to 16 workers to work in parallel on my laptop.
I was hoping there is a similar solution for any other application that is not MATLAB to run other code in parallel, especially to assign the resources. Then my application itself is built to manage them.
If it is of any help, I run my application from the command line as follows:
chmod +x ./bin/OpenSees
./bin/OpenSees Run.tcl
I have read about GNU parallel or even using SLURM on my laptop, but I am not sure if these are the best (or feasible) solutions.
I tried using GNU parallel:
chmod +x ./bin/OpenSees
parallel -j 4 ./bin/OpenSees ::: Run.tcl
but it continues doing one at a time, do you have any suggestions?

Perl script is running slow when calling bash

I am running a perl script that calls another script. Via command line. But it runs extremely slow. The back ticks makes it run via command line.
for my $results (#$data){
`/home/data/push_to_web $results->{file}`;
}
If i run the same command via bash /home/data/push_to_web book.txt the same script runs extremely fast. If i build a bash file that contains
/home/data/push_to_web book_one.txt
/home/data/push_to_web book_two.txt
/home/data/push_to_web book_three.txt
The code executes extremely fast. Is there any secret to speeding perl up via another perl script.
Your perl script fires up a new bash shell for each element in the array, whereas running the commands in bash from a file doesn't have to create any new shells.
Depending on how many files you have, and what's in your bash startup files, this could add a significant overhead.
One option would be to build a list of semicolon-separated commands in the for loop, and then run one system command at the end to execute them all in one bash process.

How does one Parallelize a shell script with arguments using GNU parallel?

I am new to bash scripting. I have a shell script that runs several functions for longitudinal image processing in Matlab through terminal. I would like to parallelize the process in the terminal.
Here is a brief example of how it runs:
./script.sh *.nii -surface -m /Applications/MATLAB_R2018b.app/bin/matlab
*.nii refers to images from a single subject taken at different times (i.e. subj1img1 subj1img2 subj3img3). There are 3 images per subject in my case. So in each run, the script runs through all images of a single subject.
I would like to parallelize this process so that I can run this script for multiple subjects at the same time. Reading through GNU parallel with my little experience I wasn't able to figure out the code I need to write to make it happen. I'd really appreciate if anyone has any suggestions.
parallel ./script.sh {} -surface -m /Applications/MATLAB_R2018b.app/bin/matlab ::: *.nii
you can start them in the background using & in a for loop as below :
for f in *.nii
do
./script.sh "$f" -surface -m /Applications/MATLAB_R2018b.app/bin/matlab &
done

mpich pass argument per CPU used

Given: 2 Ubuntu 16.04 machines with multiple CPU cores.
I want to execute multiple instances of program fixed_arg arg2on the machines, passing one file name per call as arg2 to the program.
So far, working with xargs, this works on a single machine:
find . -iname "*.ext" -print | xargs -n1 -P12 program fixed_arg
(This will find all files with extension "ext" in the current directory (.), print one file per line (-print), and call xargs to call program 12 times in parallel (-P12) with only one argument arg2per call (-n1). Note the white space on the end of the whole command.)
I want to use multiple machines on which I installed the "mpich" package from the official Ubuntu 16.04 repositories.
I just do not know how to make mpiexec to run my program with only one argument on multiple machines.
I do know that mpiexec will accept a list of arguments, but my list will be in the range of 800 to 2000 files which so far has been to long for any program.
Any help is appreciated.
You just selected wrong instrument (Or give us more details about your target program). MPI (mpich implementation, mpiexec and mpirun commands) is not for starting unrelated programs on multiple hosts, it is for starting one program with exactly same source code in the way, when program knows now many copies are there (up to 100 and more thousands) to do well-defined point-to-point and collective message passing between copies. It is instrument to parallel some scientific codes like computation over huge array which can't computed on single machine or even can't fit into its memory.
Better instrument for you can be GNU parallel (https://www.gnu.org/software/parallel/); and if you have one or two machines or it is just several runs, it is easier to manually split your file list in two parts, and run two parallel or xargs on every machine (by hand or with ssh using authorized_keys). I'll assume that all files are accessible from both machines at the same path (NFS share or something like; no magic tool like mpi or gnu parallel will forward files for you, but batch some modern batch processing system may):
find . -iname "*.ext" -print > list
l=$(wc -l < list)
sp=$((l/2))
split -l $sp list
cat xaa | xargs -n1 -P12 program fixed_arg &
cat xab | ssh SECOND_HOST xargs -n1 -P12 program fixed_arg &
wait
Or just learn about multi-host usage of GNU parallel: https://www.gnu.org/software/parallel/man.html
-S #hostgroup Distribute jobs to remote computers. The jobs will be run on a list of remote computers. GNU parallel will determine the number of CPU cores on the remote computers and run the number of jobs as specified by -j.
EXAMPLE: Using remote computers
It also has a magic of sending files to remote machine with --transferfile filename option if you have no shared FS between two ubuntus.

Programmatically start a series of processes w\ job control

I have a series of 7 processes required to run a complex web app that I develop on. I typically start these processes manually like this:
job &>/tmp/term.tail &
term.tail is a fifo pipe I leave tail running on to see the output of these processes when I need to.
I'd like to find away to start up all the processes within my current shell, but a typical script (shell or ruby) runs w\in it's own shell. Are there any work arounds?
I'm using zsh in iTerm2 on OSX.
You can run commands in the current shell with:
source scriptfile
or
. scriptfile
A side note, your processes will block if they generate much output and there isn't something reading from the pipe (i.e. if the tail dies).

Resources