Iterations of a bash script to run in parallel

Iterations of a bash script to run in parallel - bash

I have a bash script that looks like below.
$TOOL is another script which runs 2 times with different inputs(VAR1 and VAR2).
#Iteration 1
${TOOL} -ip1 ${VAR1} -ip2 ${FINAL_PML}/$1$2.txt -p ${IP} -output_format ${MODE} -o ${FINAL_MODE_DIR1}
rename mods mode_c_ ${FINAL_MODE_DIR1}/*.xml
#Iteration 2
${TOOL} -ip1 ${VAR2} -ip2 ${FINAL_PML}/$1$2.txt -p ${IP} -output_format ${MODE} -o ${FINAL_MODE_DIR2}
rename mods mode_c_ ${FINAL_MODE_DIR2}/*.xml
Can I make these 2 iterations in parallel inside a bash script without submitting it in a queue?

If I read this right, what you want is to run them in background.
c.f. https://linuxize.com/post/how-to-run-linux-commands-in-background/
More importantly, if you are going to be writing scripts, PLEASE read the following closely:
https://www.gnu.org/software/bash/manual/html_node/index.html#SEC_Contents
https://mywiki.wooledge.org/BashFAQ/001

Related

Waiting ANY child process exit on macOS?

I am wondering how to wait for any process to finish in macOS, since wait -n doesn't work. I have a script doing several things, and in some point it will enter a loop calling another script to the background to exploit some parallelism, but not more than X times since it wouldn't be efficient. Thus, I need to wait for any child process to finish before creating new processes.
I have seen this question but it doesn't answer the "any" part, it just says how to wait to a specific process to finish.
I've thought of either storing all PIDs and actively checking if they're still running with ps, but it's very slapdash and resource consuming. I also thought about upgrading bash to a newer version (if that's ever possible in macOS without breaking how bash already works), but I would be very disappointed if there was no other way to actually wait for any process to finish, it's such a basic feature... Any ideas?
A basic version of my code would look like this:
for vid_file in $VID_FILES
do
my_script.sh $vid_file other_args &
((TOTAL_PROCESSES=TOTAL_PROCESSES+1))
if [ $TOTAL_PROCESSES -ge $MAX_PROCESS ]; then
wait -n
((TOTAL_PROCESSES=TOTAL_PROCESSES-1))
fi
done
My neither elegant nor performant approach to substitute the wait -n:
NUM_PROCC=$MAX_PROCESS
while [ $NUM_PROCC -ge $MAX_PROCESS ]
do
sleep 5
NUM_PROCC=$(ps | grep "my_script.sh"| wc -l | tr -d " \t")
# grep command will count as one so we need to remove it
((NUM_PROCC=NUM_PROCC-1))
done
PS: This question could be closed and merged with the one I mentioned above. I've just created this new one because stackoverflow wouldn't let me comment or ask...
PS2: I do understand that my objective could be achieved by other means. If you don't have an answer for the specific question itself but rather a workaround, please let other people answer the question about "waiting any" since it would be very useful for me/everyone in the future as well. I will of course welcome and be thankful for the workaround too!
Thank you in advance!

It seems like you just want to limit the number of processes that are running at the same time. Here's a rudimentary way to do it with bash <= 4.2:
#!/bin/bash
MAX_PROCESS=2
INPUT_PATH=/somewhere
for vid_file in "$INPUT_PATH"/*
do
while [[ "$(jobs -pr | wc -l)" -ge "$MAX_PROCESS" ]]; do sleep 1; done
my_script.sh "$vid_file" other_args &
done
wait
Here's the bash >= 4.3 version:
#!/bin/bash
MAX_PROCESS=2
INPUT_PATH=/somewhere
for vid_file in "$INPUT_PATH"/*
do
[[ "$(jobs -pr | wc -l)" -ge "$MAX_PROCESS" ]] && wait -n
my_script.sh "$vid_file" other_args &
done
wait

GNU make has parallelization capabilities and the following Makefile should work even with the very old make 3.81 that comes with macOS. Replace the 4 leading spaces before my_script.sh by a tab and store this in a file named Makefile:
.PHONY: all $(VID_FILES)
all: $(VID_FILES)
$(VID_FILES):
my_script.sh "$#" other_args
And then to run 8 jobs max in parallel:
$ make -j8 VID_FILES="$VID_FILES"
Make can do even better: avoid redoing things that have already been done:
TARGETS := $(patsubst %,.%.done,$(VID_FILES))
.PHONY: all clean
all: $(TARGETS)
$(TARGETS): .%.done: %
my_script.sh "$<" other_args
touch "$#"
clean:
rm -f $(TARGETS)
With this last version an empty tag file .foo.done is created for each processed video foo. If, later, you re-run make and video foo did not change, it will not be re-processed. Type make clean to delete all tag files. Do not forget to replace the leading spaces by a tab.

Suggestion 1: Completion indicator file
Suggesting to add task completion indication file to your my_script.sh
Like this:
echo "$0.$(date +%F_%T).done" >> my_script.sh
And in your deployment script test if the completion indicator file exist.
rm "my_script.sh.*.done"
my_script.sh "$vid_file" other_args &
while [[ ! -e "my_script.sh.*.done" ]]; do
sleep 5
done
Don't forget to clean up the completion indicator files.
Advantages for this approach:
Simple
Supported in all shells
Retain a history/audit trail on completion
Disadvantages for this approach:
Requires modification to original script my_script.sh
Requires cleanup.
Using loop
Suggestion 2: Using wait command with pgrep command
Suggesting to learn more about wait command here.
Suggesting to learn more about pgrep command here.
my_script.sh "$vid_file" other_args &
wait $(pgrep -f "my_script.sh $vid_file")
Advantages for this approach:
Simple
Readable
Disadvantages for this approach:
Multiple users using same command same time
wait command is specific to Linux bash maybe in other shells as well. Check your current support.

With GNU Parallel it would look something like:
parallel my_script.sh {} other_args ::: $VID_FILES

Write a configuration file for each run in a separate directory, then launch mpirun

I need to do a set of calculations by changing one parameter for each time. A calculation directory contains a control file named 'test.ctrl', a job submission file named 'job-test' and a bunch of data files. Each calculation should be submitted with the same control file name (written inside the job-test), and the output is given in those data files without changing their names, which creates an overwriting problem. For this reason, I want to automize the job submission process with a bash script so that I don't need to submit each calculation by hand.
As an example, I have done the first calculation in directory b1-k1-a1 (I choose this format of dir names to indicate calc. parameters). This test.ctrl file has the parameters:
Beta=1
Kappa=1
Alpha=0 1
and I submitted this job using 'sbatch job-test' command. For the following calculations, my code should copy this whole directory with the name bX-kY-aZ, make the changes in the control file, and finally submit the job. I naively tried this writing the whole thing in the job-test file as you can see in below MWE:
#!/bin/sh
#SBATCH -N 1
#SBATCH -n 1
#SBATCH --time=0:15:00 ##hh:mm:ss
for n in $(seq 0 5)
do
for m in $(seq 0 5)
do
for v in $(seq 0 5)
do
mkdir b$n-k$m-a$v
cd b$n-k$m-a$v
cp ~/home/b01-k1-a01/* .
sed "s/Beta=1/Beta=$n/" test.ctrl
sed "s/Kappa=1/Kappa=$m/" test.ctrl
sed "s/Alpha=0 1/Alpha=0 $v/" test.ctrl
cd ..<<EOF
EOF
mpirun soft.x test.ctrl
sleep 5
done
done
done
I will appreciate if you could suggest me how to make it work this way.

It worked after I moved cd .. to the very end of the loops and removed sed, as suggested in the comments. Hence this works now:
#!/bin/sh
#SBATCH -N 1
#SBATCH -n 1
#SBATCH --time=0:15:00 ##hh:mm:ss
for n in $(seq 0 5)
do
for m in $(seq 0 5)
do
for v in $(seq 0 5)
do
mkdir b$n-k$m-a$v
cd b$n-k$m-a$v
cp ~/home/b01-k1-a01/* .
cat >test.ctrl <<EOF
Beta=$n
Kappa=$m
Alpha=0 $v
EOF
mpirun soft.x test.ctrl
sleep 5
cd ..
done
done
done

The immediate problem is that sed without any options does not modify the file at all; it just prints the results to standard output.
It is frightfully unclear what you were hoping the here document was accomplishing. cd does not read its standard input, so it wasn't accomplishing anything at all, anyway.
#!/bin/sh
#SBATCH -N 1
#SBATCH -n 1
#SBATCH --time=0:15:00 ##hh:mm:ss
for n in $(seq 0 5); do
for m in $(seq 0 5); do
for v in $(seq 0 5); do
mkdir "b$n-k$m-a$v"
cd "b$n-k$m-a$v"
cp ~/home/b01-k1-a01/* .
sed -e "s/Beta=1/Beta=$n/" \
-e "s/Kappa=1/Kappa=$m/" \
-e "s/Alpha=0 1/Alpha=0 $v/" ~/home/b01-k1-a01/test.ctrl >/test.ctrl
mpirun soft.x test.ctrl
cd ..
sleep 5
done
done
done
Notice also the merging of multiple sed commands into a single script (though as noted elsewhere, maybe printf would be even better if that's everything which you have in the configuration file).

Splitting wireshark large size with pcap splitter with bash

I have large pcapng files, and I want to split them based on my desired wireshark filters. I want to split my files by the help of bash scripts and using pcapsplitter, but when I use a loop, it always gives me the same file.
I have written a small code.
#!/bin/bash
for i in {57201..57206}
do
mkdir destination/$i
done
tcp="tcp port "
for i in {57201..57206}
do
tcp="$tcp$i"
pcapsplitter -f file.pcapng -o destination/$i -m bpf-filter -p $tcp
done
the question is, can I use bash for my goal or not?
If yes, why it does not work?

Definitely, this is something Bash can do.
Regarding your script, the first thing I can think of is this line :
pcapsplitter -f file.pcapng -o destination/$i -m bpf-filter -p $tcp
where the value of $tcp is actually tcp port 57201 (and following numbers on the next rounds. However, without quotes, you're actually passing tcp only to the -p parameter.
It should work better after you've changed this line into :
pcapsplitter -f file.pcapng -o destination/$i -m bpf-filter -p "$tcp"
NB: as a general advice, it's usually safer to double-quote variables in Bash.
NB2 : you don't need those 2 for loops. Here is how I'd rewrite your script :
#!/bin/bash
for portNumber in {57201..57206}; do
destinationDirectory="destination/$portNumber"
mkdir "$destinationDirectory"
thePparameter="tcp port $portNumber"
pcapsplitter -f 'file.pcapng' -o "$destinationDirectory" -m bpf-filter -p "$thePparameter"
done

Submit SGE job array with random file names

I have a script that was kicking off ~200 jobs for each sub-analysis. I realized that a job array would probably be much better for this for several reasons. It seems simple enough but is not quite working for me. My input files are not numbered so I've following examples I've seen I do this first:
INFILE=`sed -n ${SGE_TASK_ID}p <pathto/listOfFiles.txt`
My qsub command takes in quite a few variables as it is both pulling and outputting to different directories. $res does not change, however $INFILE is what I am looping through.
qsub -q test.q -t 1-200 -V -sync y -wd ${res} -b y perl -I /master/lib/ myanalysis.pl -c ${res}/${INFILE}/configFile-${INFILE}.txt -o ${res}/${INFILE}/
Since this was not working, I was curious as to what exactly was being passed. So I did an echo on this and saw that it only seems to expand up to the first time $INFILE is used. So I get:
perl -I /master/lib/ myanalysis.pl -c mydirectory/fileABC/
instead of:
perl -I /master/lib/ myanalysis.pl -c mydirectory/fileABC/configFile-fileABC.txt -o mydirectory/fileABC/
Hoping for some clarity on this and welcome all suggestions. Thanks in advance!
UPDATE: It doesn't look like $SGE_TASK_ID is set on the cluster. I looked for any variable that could be used for an array ID and couldn't find anything. If I see anything else I will update again.

Assuming you are using a grid engine variant then SGE_TASK_ID should be set within the job. It looks like you are expecting it to be set to some useful variable before you use qsub. Submitting a script like this would do roughly what you appear to be trying to do:
#!/bin/bash
INFILE=$(sed -n ${SGE_TASK_ID}p <pathto/listOfFiles.txt)
exec perl -I /master/lib/ myanalysis.pl -c ${res}/${INFILE}/configFile-${INFILE}.txt -o ${res}/${INFILE}/
Then submit this script with
res=${res} qsub -q test.q -t 1-200 -V -sync y -wd ${res} myscript.sh
`

How to read values from command line in bash script as given?

I want to pass arguments to a script in the form
./myscript.sh -r [1,4] -p [10,20,30]
where in myscript.sh if I do:
echo $#
But I'm getting the output as
-r 1 4 -p 1 2 3
How do I get output in the form of
-r [1,4] -p [10,20,30]
I'm using Ubuntu 12.04 and bash version 4.2.37

You have files named 1 2 3 & 4 in your working directory.
Use more quotes.
./myscript.sh -r "[1,4]" -p "[10,20,30]"
[1,4] gets expanded by bash to filenames named 1 or , or 4 (whichever are actually present on your system).
Similarly, [10,20,30] gets expanded to filenames named 1 or 0 or , or 2 or 3.
On similar note, you should also change echo $# to echo "$#"
On another note, if you really want to distinguish between the arguments, use printf '%s\n' "$#" instead of just echo "$#".

You can turn off filename expansion
set -f
./myscript.sh -r [1,4] -p [10,20,30]
Don't expect other users to want to do this, if you share your script.
The best answer is anishane's: just quote the arguments
./myscript.sh -r "[1,4]" -p "[10,20,30]"

You can just the escape the brackets[]. Like this,
./verify.sh -r \[1,4\] -p \[10,20,30\]
You can print this using the echo "$#"

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Iterations of a bash script to run in parallel - bash

Related

Waiting ANY child process exit on macOS?

Write a configuration file for each run in a separate directory, then launch mpirun

Splitting wireshark large size with pcap splitter with bash

Submit SGE job array with random file names

How to read values from command line in bash script as given?

Categories

Resources