Generating sequences of numbers and characters with bash - bash

I have written a script that accept two files as input. I want to run all in parallel at the same time on different CPUs.
inputs:
x00.A x00.B
x01.A x01.B
...
x30.A x30.B
instead of running 30 times:
./script x00.A x00.B
./script x01.A x01.B
...
./script x30.A x30.B
I wanted to use paste and seq to generate and send them to the script.
paste & seq | xargs -n1 -P 30 ./script
But I do not know how to combine letters and numbers using paste and seq commands.

for num in $(seq -f %02.f 0 30); do
./script x$num.A x$num.B &
done
wait
Although I personally prefer to not use GNU seq or BSD jot but (ksh/bash) builtins:
num=-1; while (( ++num <= 30 )); do
./script x$num.A x$num.B &
done
wait
The final wait is just needed to make sure they all finish, after having run spread across your available CPU cores in the background. So, if you need the output of ./script, you must keep the wait.
Putting them into the background with & is the simplest way for parallelism. If you really want to exercise any sort of control over lots of backgrounded jobs like that, you will need some sort of framework like GNU Parallel instead.

You can use pure bash for generating the sequence:
printf "%s %s\n" x{00..30}.{A..B} | xargs -n1 -P 30 ./script
Happy holidays!

Related

Shortest shell code to run both traceroute and traceroute6

I'd like to run both traceroute -w2 and traceroute6 -w2, sequentially, in a shell script, to try out multiple different hosts.
A naive approach may just use a temporary variable to gather all the hosts within (e.g., set HOSTS to ordns.he.net one.one.one.one google-public-dns-a.google.com), and then just pipe it individually to each command, like, echo $HOSTS | xargs -n1 traceroute -w2 et al, but this would work differently in tcsh than in bash, and may be prone to mistakes if you want to add more commands (as you'd be adding those as code as opposed to a list of things to do), and I'm thinking there's some better way to join together the list of commands (e.g., a command name with a single parameter) with the list of arguments (e.g., hostnames in our example), for the shell to execute every possible combination.
I've tried doing some combination of xargs -n1 (for hosts) and xargs -n2 (for commands with one parameter) piping into each other, but it didn't really make much sense and didn't work.
I'm looking for a solution that doesn't use any GNU tools and would work in a base OpenBSD install (if necessary, perl is part of the base OpenBSD, so, it's available as well).
If you have perl:
perl -e 'for(#ARGV){ print qx{ traceroute -w2 -- $_; traceroute6 -w2 -- $_ } }' google.com debian.org
As for a better way to join together the list of commands (e.g., a command name with a single parameter) with the list of arguments (e.g., hostnames) the answer could be GNU Parallel, which is built for doing just that:
parallel "{1}" -w2 -- "{2}" ::: traceroute traceroute6 ::: google.com debian.org
If you want special arguments connected with each command you can do:
parallel eval "{1}" -- "{2}" ::: "traceroute -a -w2" "traceroute6 -w2" ::: google.com debian.org
The eval is needed because GNU Parallel quotes all input, and while you normally want that, we do not want that in this case.
But since that is a GNU tool, it is out of scope for your question. It is only included here for other people that read your question and who do not have that limitation.
Keeping it simple:
#!/bin/sh
set -- host1 host2 host3 host4 ...
for host do traceroute -w2 -- "$host"; done
for host do traceroute6 -w2 -- "$host"; done
With GNU Parallel, the final solution for the problem at stake would be something like the following snippet, using tcsh syntax, and OS X traceroute and traceroute6:
( history 1 ; parallel --keep-order -j4 eval \{1} -w1 -- \{2} '2>&1' ::: "traceroute -a -f1" traceroute6 ::: ordns.he.net ns{1,2,3,4,5}.he.net one.one.one.one google-public-dns-a.google.com resolver1.opendns.com ; history 1 ) | & mail -s "traceroute: ordns.he.net et al from X" receipient#example.org -f sender#example.org

For loop in parallel

Is there a quick, easy, and efficient way of running iterations in this for loop in parallel?
for i in `seq 1 5000`; do
repid="$(printf "%05d" "$i")"
inp="${repid}.inp"
out="${repid}.out"
/command "$inp" "$out"
done
If you want to take advantage of all your lovely CPU cores that you paid Intel so handsomely for, turn to GNU Parallel:
seq -f "%05g" 5000 | parallel -k echo command {}.inp {}.out
If you like the look of that, run it again without the -k (which keeps the output in order) and without the echo. You may need to enclose the command in single quotes:
seq -f "%05g" 5000 | parallel '/command {}.inp {}.out'
It will run 1 instance per CPU core in parallel, but, if you want say 32 in parallel, use:
seq ... | parallel -j 32 ...
If you want an "estimated time of arrival", use:
parallel --eta ...
If you want a progress meter, use:
parallel --progress ...
If you have bash version 4+, it can zero-pad brace expansions. And if your ARGMAX is big enough, so you can more simply use:
parallel 'echo command {}.inp {}.out' ::: {00001..05000}
You can check your ARGMAX with:
sysctl -a kern.argmax
and it tells you how many bytes long your parameter list can be. You are going to need 5,000 numbers at 5 digits plus a space each, so 30,000 minimum.
If you are on macOS, you can install GNU Parallel with homebrew:
brew install parallel
for i in `seq 1 5000`; do
repid="$(printf "%05d" "$i")"
inp="${repid}.inp"
out="${repid}.out"
/command "$inp" "$out" &
done

Substitute part of the string in a loop for a 1000 times in sh/bash

I have a sh script that, if it is invoked by the command ./textgenerate text0, a file text0.txt would be generated.
Now that I need text0.txt, text1.txt, ..., text1000.txt, how could I write the script to do that. i.e., how to replace the 0 part to changing variables, say 1~1000?
I know a certain "trick" to implement this, for example generate the script 1000 times using Microsoft Excel and paste it into the shell. But is there an elegent and efficient way to do this?
for i in {0..1000}; do
./textgenerate "text${i}"
done
This loops over the range [0..1,000] and assigns each value to $i in the body of the loop. ./textgenerate will be invoked 1,001 times.
Very simply and in parallel with GNU Parallel:
parallel ./textgenerate text{}.txt ::: {0..1000}
Or, if you don't have a recent bash to expand the {0..1000}, you could equally do this:
seq 0 1000 | parallel ./textgenerate text{}.txt
And, if you want to see what it would do, without actually doing anything:
parallel --dry-run ... as above ...
And, if you want a progress bar:
parallel --bar ... as above ...
You can also let printf process the looping.
. <(printf "./textgenerate text%s\n" {0..1000})

Parallel nested for loop in bash

I am trying to run a c executable through bash. The executable will take a different argument in each iteration, and I want to do it in parallel since I have 12 cores available.
I tried
w=1;
for i in {1..100}
do
l=$(($i-1));
for j in {12*l..12*i}
do
./run $w/100 > "$w"_out &
done
expr=$w % 12;
if ["$expr" -eq "0"]
then wait;
fi;
done
run is the c executable. I want to run it with increasing argument w in each step, and I want to wait until all processes are done if 12 of the cores are in use. SO basically, I will run 12 executables at the same time, then wait until they are completed, and then move to the next 12.
Hope I made my point clear.
Cheers.
Use gnu parallel instead:
parallel ./myscript {1} ::: {1..100}
You can specify the number of parallel processes with the -P option, but it defaults to the number of cores in the system.
You can also specify -k to keep the output order and redirect the file.
To redirect the output to individual files, you can specify the output redirection, but you have to quote it, so that it is not parsed by the shell. For example:
parallel ./run {1} '>' {1}_out ::: {1..10}
is equivalent to running ./run 1 > 1_out to ./run 10 > 10_out

Looping files in bash

I want to loop over these kind of files, where the the files with same Sample_ID have to be used together
Sample_51770BL1_R1.fastq.gz
Sample_51770BL1_R2.fastq.gz
Sample_52412_R1.fastq.gz
Sample_52412_R2.fastq.gz
e.g. Sample_51770BL1_R1.fastq.gz and Sample_51770BL1_R2.fastq.gz are used together in one command to create an output.
Similarly, Sample_52412_R1.fastq.gz and Sample_52412_R2.fastq.gz are used together to create output.
I want to write a for loop in bash to iterate over and create output.
sourcedir=/sourcepath/
destdir=/destinationpath/
bwa-0.7.5a/bwa mem -t 4 human_g1k_v37.fasta Sample_52412_R1.fastq.gz Sample_52412_R2.fastq.gz>$destdir/Sample_52412_R1_R2.sam
How should I pattern match the file names Sample_ID_R1 and Sample_ID_R2 to be used in one command?
Thanks,
for fname in *_R1.fastq.gz
do
base=${fname%_R1*}
bwa-0.7.5a/bwa mem -t 4 human_g1k_v37.fasta "${base}_R1.fastq.gz" "${base}_R2.fastq.gz" >"$destdir/${base}_R1_R2.sam"
done
In the comments, you ask about running several, but not too many, jobs in parallel. Below is my first stab at that:
#!/bin/bash
# Limit background jobs to no more that $maxproc at once.
maxproc=3
for fname in * # _R1.fastq.gz
do
while [ $(jobs | wc -l) -ge "$maxproc" ]
do
sleep 1
done
base=${fname%_R1*}
echo starting new job with ongoing=$(jobs | wc -l)
bwa-0.7.5a/bwa mem -t 4 human_g1k_v37.fasta "${base}_R1.fastq.gz" "${base}_R2.fastq.gz" >"$destdir/${base}_R1_R2.sam" &
done
The optimal value of maxproc will depend on how many processors your PC has. You may need to experiment to find what works best.
Note that the above script uses jobs which is a bash builtin function. Thus, it has to be run under bash, not dash which is the default for scripts under Debian-like distributions.

Resources