How does one Parallelize a shell script with arguments using GNU parallel? - bash

I am new to bash scripting. I have a shell script that runs several functions for longitudinal image processing in Matlab through terminal. I would like to parallelize the process in the terminal.
Here is a brief example of how it runs:
./script.sh *.nii -surface -m /Applications/MATLAB_R2018b.app/bin/matlab
*.nii refers to images from a single subject taken at different times (i.e. subj1img1 subj1img2 subj3img3). There are 3 images per subject in my case. So in each run, the script runs through all images of a single subject.
I would like to parallelize this process so that I can run this script for multiple subjects at the same time. Reading through GNU parallel with my little experience I wasn't able to figure out the code I need to write to make it happen. I'd really appreciate if anyone has any suggestions.

parallel ./script.sh {} -surface -m /Applications/MATLAB_R2018b.app/bin/matlab ::: *.nii

you can start them in the background using & in a for loop as below :
for f in *.nii
do
./script.sh "$f" -surface -m /Applications/MATLAB_R2018b.app/bin/matlab &
done

Related

How to run scripts in parallel on Windows

On linux or Mac, regularly run several python scripts (or other programs, for that matter) in parallel. A use case could be that the script runs a simulation based on random numbers, and I just want to run it many times to get good statistics.
An easy way of doing this on linux or Mac would be to use a for loop, and use an ampersand & to make the jobs run in parallel:
for i in {1..10}; do python script.py & ; done
Another usecase would be that I want to run a script on some stored data in a file. Say I have a bunch of .npy files with stored data, and I want to process them all with the same script, running 4 jobs in parallel (since I have a 4-core CPU), I could use xargs:
ls *.npy | xargs -P4 -n1 python script.py
Are there equivalent ways of doing this on the Windows command line?

Show real-time progress with GNU Parallel and Stata

I'm using GNU parallel to run a Stata do file for many different data sets.
I have a Bash script that contains the following:
parallel -a arguments.txt -j 3 stata -b do $dofileloc {}
Since the do file has several different parts for each dataset, I would like to have the progress shown "real-time" (e.g. display "data loaded for XYZ" after a part of the Stata do file finishes for a dataset etc.).
So I'd like to redirect messages from Stata to the command line, but I'm having trouble doing this.
If I don't run Stata in batch mode I can see everything, which is a bit messy. I have tried using the shell command in Stata but I can't seem to figure out the correct combination.
I would appreciate any tips.
Does this do what you want?
parallel --tag --linebuffer -a arguments.txt -j 3 stata -b do $dofileloc {}

How to process multiple files in sequence using OPEN MP and/or MPI?

I'm using the parallel_multicore version of the DBSCAN clustering algorithm available below:
http://cucis.ece.northwestern.edu/projects/Clustering/index.html
To run the code simply requires the following line:
./omp_dbscan -i trial.txt -m 4 -e 0.5 -o output.txt -t 8
where -i is the input, -m and -e are two parameters, -o is the output and -t is the number of threads.
What I want to do is adapt this command so that I can process lots of input files (say trial_1.txt, trial_2.txt, trial_3.txt and so on) sequentially, but I'm not really sure how to do this in this language? Any help would be greatly appreciated as I'm thoroughly lost!
Any unix server will have a shell installed.
Unix shells have been used for automating simple processes since the beginning of Unix. They are the scripting language at the very heart of any Unix system.
Their syntax is very easy to learn, and it will allow you to easily automate such tasks. So get an tutorial to shell scripting!

How do I curl multiple resources in one command?

Say I am trying to download a set of 50 lecture notes efficiently. These notes are inside the prof subdirectory of a university website. The 45th lecture note is inside the lect45 subdirectory as a pdf entitled lect45.pdf. I get my first pdf as follows:
curl -O http://www.university.edu/~prof/lect1/lect1.pdf
How do I get all my 50 notes efficiently using cURL and bash? I'm trying to do this from the command line, not through a Python / Ruby / Perl script. I know something like the below will generate a lot of 404s:
curl -O http://www.university.edu/~prof/lect{1..50}/lect{1..50}.pdf
so what will work better? I would prefer an elegant one-liner over a loop.
Do it in several processes:
for i in {1..50}
do
curl -O http://www.university.edu/~prof/lect$i/lect$i.pdf &
done
or as a one-liner (just a different formatting):
for i in {1..50}; do curl -O http://www.university.edu/~prof/lect$i/lect$i.pdf & done
The & makes all processes run in parallel.
Don't be scared by the output; the shell tells you that 50 processes have been started, that's a lot of spam. Later it will tell you for each of these that they terminated. A lot of output again.
You probably don't want to run all 50 in parallel ;-)
EDIT:
Your example using {1..50} twice makes a matrix of the numbers. See for example echo {1..3}/{1..3} to see what I mean. And I guess that this way you create a lot of 404s.
Take a look at parallel shell tool.
So, for this particular case it will look like
seq 50 | parallel curl -O http://www.university.edu/~prof/lect{}/lect{}.pdf
As for curl - it doesn't have its own parallel mechanism, and what for it actually should? And your example with shell expansions {1..50} seems valid for me.

Making a command loop in shell with a script

How can one loop a command/program in a Unix shell without writing the loop into a script or other application.
For example, I wrote a script that outputs a light sensor value but I'm still testing it right now so I want it run it in a loop by running the executable repeatedly.
Maybe I'd also like to just run "ls" or "df" in a loop. I know I can do this easily in a few lines of bash code, but being able to type a command in the terminal for any given set of command would be just as useful to me.
You can write the exact same loop you would write in a shell script by writing it in one line putting semicolons instead of returns, like in
for NAME [in LIST ]; do COMMANDS; done
At that point you could write a shell script called, for example, repeat that, given a command, runs it N times, by simpling changing COMMANDS with $1 .
I recommend the use of "watch", it just do exactly what you want, and it cleans the terminal before each execution of the commands, so it's easy to monitor changes.
You probably have it already, just try watch ls or watch ./my_script.sh. You can even control how much time to wait between each execution, in seconds, with the -n option, and you can use -d to highlight the difference in the output of consecutive runs.
Try:
Run ls each second:
watch -n 1 ls
Run my_script.sh each 3 seconds, and highlight differences:
watch -n 3 -d ./my_script.sh
watch program man page:
http://linux.die.net/man/1/watch
This doesn't exactly answer your question, but I felt it was relavent. One of the great things with shell looping is that some commands return lists of items. Of course that is obvious, but a something you can do using the for loop is execute a command on that list of items.
for $file in `find . -name *.wma`; do cp $file ./new/location/ done;
You can get creative and do some very powerful stuff.
Aside from accepting arguments, anything you can do in a script can be done on the command line. Earlier I typed this directly in to bash to watch a directory fill up as I transferred files:
while sleep 5s
do
ls photos
end

Resources