For loop in parallel - bash

Is there a quick, easy, and efficient way of running iterations in this for loop in parallel?
for i in `seq 1 5000`; do
repid="$(printf "%05d" "$i")"
inp="${repid}.inp"
out="${repid}.out"
/command "$inp" "$out"
done

If you want to take advantage of all your lovely CPU cores that you paid Intel so handsomely for, turn to GNU Parallel:
seq -f "%05g" 5000 | parallel -k echo command {}.inp {}.out
If you like the look of that, run it again without the -k (which keeps the output in order) and without the echo. You may need to enclose the command in single quotes:
seq -f "%05g" 5000 | parallel '/command {}.inp {}.out'
It will run 1 instance per CPU core in parallel, but, if you want say 32 in parallel, use:
seq ... | parallel -j 32 ...
If you want an "estimated time of arrival", use:
parallel --eta ...
If you want a progress meter, use:
parallel --progress ...
If you have bash version 4+, it can zero-pad brace expansions. And if your ARGMAX is big enough, so you can more simply use:
parallel 'echo command {}.inp {}.out' ::: {00001..05000}
You can check your ARGMAX with:
sysctl -a kern.argmax
and it tells you how many bytes long your parameter list can be. You are going to need 5,000 numbers at 5 digits plus a space each, so 30,000 minimum.
If you are on macOS, you can install GNU Parallel with homebrew:
brew install parallel

for i in `seq 1 5000`; do
repid="$(printf "%05d" "$i")"
inp="${repid}.inp"
out="${repid}.out"
/command "$inp" "$out" &
done

Related

parallel execution of script on cluster for multiple inputs

I have a script pytonscript.py that I want to run on 500 samples. I have 50 CPUs available and want to run the script in parallel using 1 CPU for each sample, so that 50 samples are constantly running with 1 CPU each. Any ideas how to set this up without typing 500 lines with the different inputs? I know how to make a loop for each sample, but not how to make 50 samples running in parallel. I guess GNU parallel is a way?
Input samples in folder samples:
sample1
sample2
sample2
...
sample500
pytonscript.py -i samples/sample1.sam.bz2 -o output_folder
What about GNU xargs?
printf '%s\0' samples/sample*.sam.bz |
xargs -0L1 -P 50 pytonscript.py -o output_dir -i
This launches a new instance of the python script for each file, concurrently, maintaining a maximum of 50 at once.
If the wildcard glob expansion isn't specific enough, you can use bash's extglob: shopt -s exglob; # samples/sample+([0-9]).sam.bz

Shortest shell code to run both traceroute and traceroute6

I'd like to run both traceroute -w2 and traceroute6 -w2, sequentially, in a shell script, to try out multiple different hosts.
A naive approach may just use a temporary variable to gather all the hosts within (e.g., set HOSTS to ordns.he.net one.one.one.one google-public-dns-a.google.com), and then just pipe it individually to each command, like, echo $HOSTS | xargs -n1 traceroute -w2 et al, but this would work differently in tcsh than in bash, and may be prone to mistakes if you want to add more commands (as you'd be adding those as code as opposed to a list of things to do), and I'm thinking there's some better way to join together the list of commands (e.g., a command name with a single parameter) with the list of arguments (e.g., hostnames in our example), for the shell to execute every possible combination.
I've tried doing some combination of xargs -n1 (for hosts) and xargs -n2 (for commands with one parameter) piping into each other, but it didn't really make much sense and didn't work.
I'm looking for a solution that doesn't use any GNU tools and would work in a base OpenBSD install (if necessary, perl is part of the base OpenBSD, so, it's available as well).
If you have perl:
perl -e 'for(#ARGV){ print qx{ traceroute -w2 -- $_; traceroute6 -w2 -- $_ } }' google.com debian.org
As for a better way to join together the list of commands (e.g., a command name with a single parameter) with the list of arguments (e.g., hostnames) the answer could be GNU Parallel, which is built for doing just that:
parallel "{1}" -w2 -- "{2}" ::: traceroute traceroute6 ::: google.com debian.org
If you want special arguments connected with each command you can do:
parallel eval "{1}" -- "{2}" ::: "traceroute -a -w2" "traceroute6 -w2" ::: google.com debian.org
The eval is needed because GNU Parallel quotes all input, and while you normally want that, we do not want that in this case.
But since that is a GNU tool, it is out of scope for your question. It is only included here for other people that read your question and who do not have that limitation.
Keeping it simple:
#!/bin/sh
set -- host1 host2 host3 host4 ...
for host do traceroute -w2 -- "$host"; done
for host do traceroute6 -w2 -- "$host"; done
With GNU Parallel, the final solution for the problem at stake would be something like the following snippet, using tcsh syntax, and OS X traceroute and traceroute6:
( history 1 ; parallel --keep-order -j4 eval \{1} -w1 -- \{2} '2>&1' ::: "traceroute -a -f1" traceroute6 ::: ordns.he.net ns{1,2,3,4,5}.he.net one.one.one.one google-public-dns-a.google.com resolver1.opendns.com ; history 1 ) | & mail -s "traceroute: ordns.he.net et al from X" receipient#example.org -f sender#example.org

Generating sequences of numbers and characters with bash

I have written a script that accept two files as input. I want to run all in parallel at the same time on different CPUs.
inputs:
x00.A x00.B
x01.A x01.B
...
x30.A x30.B
instead of running 30 times:
./script x00.A x00.B
./script x01.A x01.B
...
./script x30.A x30.B
I wanted to use paste and seq to generate and send them to the script.
paste & seq | xargs -n1 -P 30 ./script
But I do not know how to combine letters and numbers using paste and seq commands.
for num in $(seq -f %02.f 0 30); do
./script x$num.A x$num.B &
done
wait
Although I personally prefer to not use GNU seq or BSD jot but (ksh/bash) builtins:
num=-1; while (( ++num <= 30 )); do
./script x$num.A x$num.B &
done
wait
The final wait is just needed to make sure they all finish, after having run spread across your available CPU cores in the background. So, if you need the output of ./script, you must keep the wait.
Putting them into the background with & is the simplest way for parallelism. If you really want to exercise any sort of control over lots of backgrounded jobs like that, you will need some sort of framework like GNU Parallel instead.
You can use pure bash for generating the sequence:
printf "%s %s\n" x{00..30}.{A..B} | xargs -n1 -P 30 ./script
Happy holidays!

How to start a large number of quick jobs in Bash

I have 3000 very quick jobs to run that on average take 2/3 seconds.
The list of jobs is in a file, and I want to control how many I have open.
However, the process of starting a job in background (& line) seems to take some time itself, therefore, some jobs are already finishing before "INTOTAL" amount have got started...
Therefore, I am not using my 32 core efficiently.
Is the a better approach than the one below?
#!/bin/sh
#set -x
INTOTAL=28
while true
do
NUMRUNNING=`tasklist | egrep Prod.exe | wc -l`
JOBS=`cat jobs.lst | wc -l`
if [ $JOBS -gt 0 ]
then
MAXSTART=$(($INTOTAL-$NUMRUNNING))
NUMTOSTART=$JOBS
if [ $NUMTOSTART -gt $MAXSTART ]
then
NUMTOSTART=$MAXSTART
fi
echo 'Starting: '$NUMTOSTART
for ((i=1;i<=$NUMTOSTART;i++))
do
JOB=`head -n1 jobs.lst`
sed -i 1d jobs.lst
/Prod $JOB &
done
sleep 2
fi
sleep 3
done
You may want to have a look at parallel, which you should be able to install on Cygwin according to the release notes. Then running the tasks in parallel can be as easy as:
parallel /Prod {} < jobs.lst
See here for an example of this in its man page (and have a look through the plethora of examples for more about the many options it has).
To control how many jobs to run at a time use the -j flag. By default it will run 1 job per core at a time, so 32 for you. To limit to 16 for instance:
parallel -j 16 /Prod {} < jobs.lst

Parallel nested for loop in bash

I am trying to run a c executable through bash. The executable will take a different argument in each iteration, and I want to do it in parallel since I have 12 cores available.
I tried
w=1;
for i in {1..100}
do
l=$(($i-1));
for j in {12*l..12*i}
do
./run $w/100 > "$w"_out &
done
expr=$w % 12;
if ["$expr" -eq "0"]
then wait;
fi;
done
run is the c executable. I want to run it with increasing argument w in each step, and I want to wait until all processes are done if 12 of the cores are in use. SO basically, I will run 12 executables at the same time, then wait until they are completed, and then move to the next 12.
Hope I made my point clear.
Cheers.
Use gnu parallel instead:
parallel ./myscript {1} ::: {1..100}
You can specify the number of parallel processes with the -P option, but it defaults to the number of cores in the system.
You can also specify -k to keep the output order and redirect the file.
To redirect the output to individual files, you can specify the output redirection, but you have to quote it, so that it is not parsed by the shell. For example:
parallel ./run {1} '>' {1}_out ::: {1..10}
is equivalent to running ./run 1 > 1_out to ./run 10 > 10_out

Resources