include bash script arguments when submitting via bsub - bash

I have the following shell script.
#!/bin/bash --login
#BSUB -q q_ab_mpc_work
#BSUB -J psipred
#BSUB -W 01:00
#BSUB -n 64
#BSUB -o psipred.out
#BSUB -e psipred.err
module load compiler/gnu-4.8.0
module load R/3.0.1
export OMP_NUM_THREADS=4
code=${HOME}/Phd/script_dev/rfpipeline.sh
MYPATH=$HOME/Phd/script_dev/
cd ${MYPATH}
${code} myfile.txt
in which I can use bsub to submit program to cluster:
bsub < myprogram.sh
however I change the last line in my program to:
${code} $1
where I use a command line argument to specify the file, how can I pass this to bsub?
I have tried:
bsub < myprogram.sh myfile.text
however bsub will not accept myfile.text as a bash parameter.
I have also tried
bsub <<< myprogram.sh myfile.text
./myprogram.sh myfile.text | bsub
bsub "sh ./myprogram.sh myfile.text"
what do I need to do?

Can I answer my own question?
It seems that I can use sed to modify the file on the fly. My original file is now:
#!/bin/bash --login
#BSUB -q q_ab_mpc_work
#BSUB -J psipred
#BSUB -W 01:00
#BSUB -n 64
#BSUB -o psipred.out
#BSUB -e psipred.err
module load compiler/gnu-4.8.0
module load R/3.0.1
export OMP_NUM_THREADS=4
code=${HOME}/Phd/script_dev/rfpipeline.sh
MYPATH=$HOME/Phd/script_dev/
cd ${MYPATH}
${code} myfile
and I wrote a bash script, sender.sh to both modify the variable myfile with a command line argument, and send the modified file off to bsub:
#!/bin/bash
sed "s/myfile/$1/g" < myprogram.sh | bsub
being careful to use double quotes so that bash does not read $ literally. I then simply run ./sender.sh jobfile.txt which works!
Hope this helps anybody.

This answer should resolve your problem:
https://unix.stackexchange.com/questions/144518/pass-argument-to-script-then-redirect-script-as-input-to-bsub
Just pass the script with arguments at the end of the bsub command.
Ex.
example.sh
#!/bin/bash
export input=${1}
echo "arg input: ${input}"
bsub command:
bsub [bsub args] "path/to/example.sh arg1"

Related

How to loop through a script with SLURM? (sbatch and srun)

New to slurm, I have a script that was written to run the same command many times that has multiple inputs and outputs. If i have another shell script, is there a way that I can loop through that in multiple srun commands. My thought you would be something along the lines of:
shell script:
#!/bin/bash
ExCommand -f input1a -b input2a -c input3a -o outputa
ExCommand -f input1b -b input2b -c input3b -o outputb
ExCommand -f input1c -b input2c -c input3c -o outputc
ExCommand -f input1d -b input2d -c input3d -o outputd
ExCommand -f input1e -b input2e -c input3e -o outpute
sbatch script
#!/bin/bash
## Job Name
#SBATCH --job-name=collectAlignmentMetrics
## Allocation Definition
## Resources
## Nodes
#SBATCH --nodes=1
## Time limir
#SBATCH --time=4:00:00
## Memory per node
#SBATCH --mem=64G
## Specify the working directory for this job
for line in shellscript
do
srun command
done
Any ideas?
Try replace your for loop with this:
while read -r line;
do
if [[ $line == \#* ]]; continue ; fi
srun $line
done < shellscript

How to make a loop for getting input and output

I have a command line like this:
myscript constant/tap.txt -n base.dat -c normal/sta0.grs -o normal/brs0.opm
I have 100 .grs files and I need to generate 100 .opm files.
I want to put the command above into a loop that does the following:
myscript constant/tap.txt -n base.dat -c normal/sta0.grs -o normal/brs0.opm
myscript constant/tap.txt -n base.dat -c normal/sta1.grs -o normal/brs1.opm
myscript constant/tap.txt -n base.dat -c normal/sta2.grs -o normal/brs2.opm
myscript constant/tap.txt -n base.dat -c normal/sta3.grs -o normal/brs3.opm
myscript constant/tap.txt -n base.dat -c normal/sta4.grs -o normal/brs4.opm
.
.
.
myscript constant/tap.txt -n base.dat -c normal/sta100.grs -o normal/brs100.opm
I was trying to make it like below:
#!/bin/bash
# Basic until loop
counter=100
until [ $counter -gt 100 ]
do
myscript constant/tap.txt -n base.dat -c normal/sta100.grs -o normal/brs100.opm
done
echo All done
but I could not find a way to set the parameters changes during the loop
In the above command these are constant for each run:
myscript constant/tap.txt -n base.dat -c
The only thing that changes in each loop is the following input and output:
normal/sta100.grs
normal/brs100.opm
I have 100 of sta.grs in the normal folder and I want to create 100 of brs.opm in the normal folder.
#!/bin/bash
counter=0
until ((counter>100))
do
myscript constant/tap.txt -n base.dat -c normal/sta$counter.grs -o normal/brs$counter.opm
((++counter))
done
echo 'All done'
This is an excellent use case for GNU parallel:
find normal -name '*.grs' |
parallel myscript constant/tap.txt -n base.dat -c {} -o {.}.opm
The less code you write, the less errors you make. And this generalizes nicely to cases where your files are named in more complex patterns. And you get parallelization for free (you can get rid of it with -j1).
Instead of incrementing the counter manually, you could use a for loop like this:
for i in {0..100}; do
myscript constant/tap.txt -n base.dat -c normal/sta"$i".grs -o normal/"$i".opm
done
Also, consider that this will sort in an unintuitive way:
1.opm
10.opm
100.opm
11.opm
12.opm
so maybe use padded numbers everywhere with for i in {000..100}; do. This requires Bash 4.0 or newer; if you don't have that, you could do something like
for i in {0..100}; do
printf -v ipad '%03d' "$i"
myscript constant/tap.txt -n base.dat -c normal/sta"$ipad".grs \
-o normal/"$ipad".opm
done
where the printf line puts a padded version of the counter into the ipad variable.
(And if you have Bash older than 3.1, you can't use printf -v and have to do
ipad=$(printf '%03d' "$i")
instead.)

Use variables as argument parameters inside qsub script

I want to pick up a number of models from a folder and use them in an sge script for an array job. So I do the following in the SGE script:
MODELS=/home/sahil/Codes/bistable/models
numModels=(`ls $MODELS|wc -l`)
echo $numModels
#$ -S /bin/bash
#$ -cwd
#$ -V
#$ -t 1-$[numModels] # Running array job over all files in the models directory.
model=(`ls $MODELS`)
echo "Starting ${model[$SGE_TASK_ID-1]}..."
But I get the following error:
Unable to read script file because of error: Numerical value invalid!
The initial portion of string "$numModels" contains no decimal number
I have also tried to use
#$ -t 1-${numModels}
and
#$ -t 1-(`$numModels`)
but none of these work. Any suggestions/alternate methods are welcome, but they must use the array job functionality of qsub.
Beware that to Bash, #$ -t 1-$[numModels] is nothing more than a comment; hence it does not apply variable expansion to numModels.
One option is to pass the -t argument in the command line: remove it from your script:
#$ -S /bin/bash
#$ -cwd
#$ -V
model=(`ls $MODELS`)
echo "Starting ${model[$SGE_TASK_ID-1]}..."
and submit the script with
MODELS=/home/sahil/Codes/bistable/models qsub -t 1-$(ls $MODELS|wc -l) submit.sh
If you prefer to have a self-contained submission script, another option is to pass the content of the whole script through stdin like this:
#!/bin/bash
qsub <<EOT
MODELS=/home/sahil/Codes/bistable/models
numModels=(`ls $MODELS|wc -l`)
echo $numModels
#$ -S /bin/bash
#$ -cwd
#$ -V
#$ -t 1-$[numModels] # Running array job over all files in the models directory.
model=(`ls $MODELS`)
echo "Starting ${model[$SGE_TASK_ID-1]}..."
EOT
Then you source or execute that script directly to submit your job array (./submit.sh rather than qsub submit.sh as the qsub command is here part of the script.

Directly pass parameters to pbs script

Is there a way to directly pass parameters to a .pbs script before submitting a job? I need to loop over a list of files indicated by different numbers and apply a script to analyze each file.
The best I've been able to come up with is the following:
#!/bin/sh
for ((i= 1; i<= 10; i++))
do
export FILENUM=$i
qsub pass_test.pbs
done
where pass_test.pbs is the following script:
#!/bin/sh
#PBS -V
#PBS -S /bin/sh
#PBS -N pass_test
#PBS -l nodes=1:ppn=1,walltime=00:02:00
#PBS -M XXXXXX#XXX.edu
cd /scratch/XXXXXX/pass_test
./run_test $FILENUM
But this feels a bit wonky. Particularly, I want to avoid having to create an environment variable to handle this.
The qsub utility can read the script from the standard input, so by using a here document you can create scripts on the fly, dynamically:
#!/bin/sh
for i in `seq 1 10`
do
cat <<EOS | qsub -
#!/bin/sh
#PBS -V
#PBS -S /bin/sh
#PBS -N pass_test
#PBS -l nodes=1:ppn=1,walltime=00:02:00
#PBS -M XXXXXX#XXX.edu
cd /scratch/XXXXXX/pass_test
./run_test $i
EOS
done
Personally, I would use a more compact version:
#!/bin/sh
for i in `seq 1 10`
do
cat <<EOS | qsub -V -S /bin/sh -N pass_test -l nodes=1:ppn=1,walltime=00:02:00 -M XXXXXX#XXX.edu -
cd /scratch/XXXXXX/pass_test
./run_test $i
EOS
done
You can use the -F option, as described here:
-F
Specifies the arguments that will be passed to the job script when the script is launched. The accepted syntax is:
qsub -F "myarg1 myarg2 myarg3=myarg3value" myscript2.sh
Note: Quotation marks are required. qsub will fail with an error
message if the argument following -F is not a quoted value. The
pbs_mom server will pass the quoted value as arguments to the job
script when it launches the script.
See also this answer
If you just need to pass numbers and run a list of jobs with the same command except the input file number, it's better to use a job array instead of a for loop as job array would have less burden on the job scheduler.
To run, you specify the file number with PBS_ARRAYID like this in the pbs file:
./run_test ${PBS_ARRAYID}
And to invoke it, on command line, type:
qsub -t 1-10 pass_test.pbs
where you can specify what array id to use after -t option

How to properly pass on an environment variable to Sun Grid Engine?

I'm trying to submit a (series of) jobs to SGE (FWIW, it's a sequence of Gromacs molecular dynamics simulations), in which all the jobs are identical except for a suffix, such as input01, input02, etc. I wrote the commands to run in a way that the suffix is properly handled by the sequence of commands.
However, I can't find a way to get the exec environment to receive that variable. According to the qsub man page, -v var should do it.
$ export i=19
$ export | grep ' i='
declare -x i="19"
$ env | grep '^i='
i=19
Then, I submit the following script (run.sh) to see if it's received:
if [ "x" == "x$i" ]; then
echo "ERROR: \$i not set"
else
echo "SUCCESS: \$i is set"
fi
I submit the job as follows (in the same session as the export command above):
$ qsub -N "test_env" -cwd -v i run.sh
Your job 4606 ("test_env") has been submitted
The error stream is empty, and the output stream has:
$ cat test_env.o4606
ERROR: $i not set
I also tried the following commands, unsuccessfully:
$ qsub -N "test_env" -cwd -v i -V run.sh
$ qsub -N "test_env" -cwd -V run.sh
$ qsub -N "test_env" -cwd -v i=19 -V run.sh
$ qsub -N "test_env" -cwd -v i=19 run.sh
If I add a line i=19 to the beginning of run.sh, then the output is:
$ cat test_env.o4613
SUCCESS: $i is set as 19
I'm now considering generating a single file per job, which will essentially be the same but will have an i=xx line as the first. It doesn't look very much practical, but it would be a solution.
Would there be a better solution?
What I've been always doing is the following:
##send.sh
export a=10
qsub ./run.sh
and the script run.sh:
##run.sh
#$ -V
echo $a
when I call send.sh, the .o has an output of 10.
Assuming that your variable is just an incrementing counter: You can use Array Jobs to achieve this. This will set an $SGE_TASK_ID environment variable to the count which you can then copy to $i or use directly.
If the variable is anything else, then I think you'll have to generate multiple job scripts and submit each; that's the "solution" I use when I have loads of jobs with differing parameters.
I'm not certain you can pass variables by their name through qsub. I've had success with passing values (you should probably write a front-end script for this instead of doing it interactively):
$ export ii=19
$ qsub -N "test_env" -cwd -v i=$ii run.sh

Resources