How to run on BASH a python script that takes multiple arguments using GNU parallel? - bash

I have a python script which I normally execute from BASH shell like this:
pychimera $(which dockprep.py) -rec receptor1.pdb -lig ligand1.mol -cmethod gas -neut
As you see some of the arguments need an input (e.g. -rec) while others don't (e.g. -neut). I must execute this script 154 times with different inputs. How is it possible to run 8 threads in parallel using GNU parallel script?
pychimera $(which dockprep.py) -rec receptor1.pdb -lig ligand1.mol -cmethod gas -neut
pychimera $(which dockprep.py) -rec receptor2.pdb -lig ligand2.mol -cmethod gas -neut
pychimera $(which dockprep.py) -rec receptor3.pdb -lig ligand3.mol -cmethod gas -neut
...

I think you want this:
parallel 'pychimera $(which dockprep.py) -rec receptor{}.pdb -lig ligand{}.mol -cmethod gas -neut' ::: {1..154}
If you have other than 8 CPU cores, and specifically want 8 processes at a time, use:
parallel -j8 ...
If you want to see the commands that would be run without actually running anything, use:
parallel --dry-run ...

Example commands.txt generator script:
#!/usr/bin/env bash
if [ "$#" -ne 1 ]; then
echo "missing parameter: n"
exit 1
fi
rm commands.txt 2> /dev/null
dockp=$(which dockprep.py)
for((i=1;i<=$1;i++)); do
echo "pychimera $dockp -rec receptor$i.pdb -lig ligand$i.mol -cmethod gas -neut" >> commands.txt
done
If you save above bash script as cmdgen.sh you can run it as:
bash cmdgen.sh 100
if you need n to be 100.
To run commands in parallel:
$ module load parallel
$ parallel < commands.txt

Related

Is there any difference between "sh -c 'some comand'" and directly run some command

let's say echo command, we can run that command by two ways:
# by 1
echo 'hello'
# or by 2
sh -c "echo 'hello'"
Is there any difference between the two ways? By the way, I can see the way 2 is very popular in yaml config files.
- name: init-mydb
image: busybox:1.28
command: ['sh', '-c', "sleep 2; done"]
The first way calls an inherited command interpreter, eg from a terminal running /bin/bash ; the second way exec sh (aka Bourne Shell) as the interpreter and instruct him ( -c ) to do something.
sh, ksh, csh, bash are all shell interpreters. They provide some features that are not always compatible between them. So, if you don't know the environment where your program will run, the best is to specify the interpreter you want, which is less error prone.
This is a single command:
foo 1 2 3
So is this
sh -c 'foo 1 2 3'
This is not a single command (but rather a pair of commands separated by a ;)
foo; bar
but this is
sh -c "foo; bar"
This does not specify a command using the name of a executable file
for x in 1 2 3; do echo "$x"; done
but this does
sh -c 'for x in 1 2 3; do echo "$x"; done'
sh -c is basically way to specify an arbitrary shell script as a single argument to a command that can be executed from a file.

GNU Parallel: Run bash code that reads (seq number) from pipe?

I would like parallel to read the (seq numbers) pipe, so I would like running something like that:
seq 2000 | parallel --max-args 0 --jobs 10 "{ read test; echo $test; }"
Would be equivalent to running:
echo 1
echo 2
echo 3
echo 4
...
echo 2000
But unfortunately, the pipe was not read by parallel, meaning that it was instead ran like:
echo
echo
echo
...
echo
And the output is empty.
Does anyone know how to make parallel read (seq numbers) pipe? Thanks.
An alternative with GNU xargs that does not require GNU parallel:
seq 2000 | xargs -P 10 -I {} "echo" "hello world {}"
Output:
hello world 1
hello world 2
hello world 3
hello world 4
hello world 5
.
.
.
From man xargs:
-P max-procs: Run up to max-procs processes at a time; the default is 1. If max-procs is 0, xargs will run as many processes as possible at a time.
-I replace-str: Replace occurrences of replace-str in the initial-arguments with names read from standard input.
You want the input to be piped into the command you run, so use --pipe:
seq 2000 |
parallel --pipe -N1 --jobs 10 'read test; echo $test;'
But if you really just need it for a variable, I would do one of these:
seq 2000 | parallel --jobs 10 echo
seq 2000 | parallel --jobs 10 echo {}
seq 2000 | parallel --jobs 10 'test={}; echo $test'
I will encourage you to spend 20 minutes on reading chapter 1+2 of https://doi.org/10.5281/zenodo.1146014 Your command line will love you for it.
Using xargs instead of parallel while still using a shell (instead of starting up a new copy of the /bin/echo executable per line to run) would look like:
seq 2000 | xargs -P 10 \
sh -c 'for arg in "$#"; do echo "hello world $arg"; done' _
This is likely to be faster than the existing answer by Cyrus, because starting executables takes time; even though starting a new copy of /bin/sh takes longer than starting a copy of /bin/echo, because this isn't using -I {}, it's able to pass many arguments to each copy of /bin/sh, thus amortizing that time cost over more numbers; and that way we're able to used the copy of echo built into sh, instead of the separate echo executable.

Running bash script in parallel

I have a very simple command that I would like to execute in parallel rather than sequential.
>for i in ../data/*; do ./run.sh $i done
run.sh processes the input files from the ../data directory and I would like to perform this process all at the same time using a shell script rather than a Python program or something like that. Is there a way to do this using GNU Parallel?
You can try this:
shopt -s nullglob
FILES=(../data/*)
[[ ${#FILES[#]} -gt 0 ]] && printf '%s\0' "${FILES[#]}" | parallel -0 --jobs 2 ./run.sh
I have not used GNU Parallel but you can use & to run your script in the background. Add a wait (optional) later if you want to wait for all the scripts to finish.
for i in ../data/*; do ./run.sh $i & done
# Below wait command is optional
wait
echo "All scripts executed"
You can try this:
find ../data -maxdepth 1 -name '[^.]*' -print0 | parallel -0 --jobs 2 ./run.sh
The name argument of the find command is needed because you used shell globbing ../data/* in your example and so we need to ignore files starting with a dot.

How to run parallel for loops

I'm not very familiar with bash, but I would like split up this code such that I can run it on a server with 12 processors:
#!/bin/bash
#bashScript.sh
for i in {1..209}
do
Rscript Compute.R $i
done
How would I go about achieving this?
Thanks!
Use xargs with the option --max-procs (-P). If there are enough arguments, xargs will use exactly this number of concurrent processes to process the input:
#! /bin/bash
seq 209 |
xargs -P12 -r -n1 Rscript Compute.R
Try:
#!/bin/bash
#bashScript.sh
for i in {1..209}
do
Rscript Compute.R $i &
done
Use GNU Parallel:
parallel Rscript Compute.R ::: {1..209}
10 seconds installation:
wget -O - pi.dk/3 | sh
Watch the intro video for a quick introduction:
https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

How to correctly wrap multiple command calls in bash?

My problem can be summed up by making this simple command works :
nice -n 10 "ls|xargs -I% echo \"%\""
Which fails :
nice: ls|xargs -I% echo "%": No such file or directory
Removing the quotes makes it works, but my point is to wrap multiple quoted commands into one to do something more complex like :
ftphost="192.168.1.1"
dirinputtopush="/tmp/archivedir/"
ftpoutputdir="mydir/"
nice -n 19 ls $dirinputtopush | xargs -I% "lftp $ftphost -e \"mirror -R $dirinputtopush% $ftpoutputdirrecent ;quit\"; sleep 10"
Try using nice -n 10 bash -c 'your; commands | or_complex pipelines' as command. This way bash is the binary and the string after -c contains a sequence interpreted by bash so it can contain pipelines, loops etc. Watch out for proper quoting. You need to do it this way because nice expects a binary, not expressions interpreted by the shell. In contrast, shell builtins such as time (but not /usr/bin/time which is a separate binary) will accept shell expressions as the command to execute. They can because they're built into the shell. nice is not, so it requires a binary to execute.
Children inherit nice value:
nice -n 10 bash -c 'ls | xargs -I% echo %'
Nice each command separately:
nice -n 10 ls | nice -n 10 xargs -I% echo %

Resources