bash script PBS_ARRAYID variable argument does not qsub to job - bash

I want to pass the PBS_ARRAYID to the main argument vector (argv) through qsub but after reading every return in pages of google results - I cannot get this to work. A constant argument qsubs fine.
#
#$ -cwd
#$ -S /bin/bash
#$ -j y
#$ -t 1-3
#$ -pe fah 1
var1=$(echo "$PBS_ARRAYID" -l)
const1=1
./daedalus_linux_1.3_64 $const1 $var1
I lifted the Array code from the solution given here Using a loop variable in a bash script to pass different command-line args
From everything I have read this should work. And it does work with the exception of var1=$(echo "$PBS_ARRAYID" -l)

It turns out the answer is fairly simple, our University uses a Sun Grid Engine queue - SGE
The tutorials I found by search were all by chance for PBS queue
#
#$ -cwd
#$ -S /bin/bash
#$ -j y
#$ -t 1-9
#$ -pe fah 3
const1=1
./daedalus_linux_1.3_64 $const1 $SGE_TASK_ID

Related

How to submit a job array in Hoffman2 if I have a limit of 500 jobs?

I need to submit a job array of 100'000 jobs in Hoffman2. I have a limit of 500. Thus starting job 500, I get the following error:
Unable to run job: job rejected: Only 500 jobs are allowed per user (current job count: 500). Job of user "XX" would exceed that limit. Exiting.
Right now the submission Bash code is:
#!/bin/bash
#$ -cwd
#$ -o test.joblog.LOOP.$JOB_ID
#$ -j y
#$ -l h_data=3G,h_rt=02:00:00
#$ -m n
#$ -t 1-100000
echo "STARTING TIME -- $(date) "
echo "$SGE_TASK_ID "
/u/systems/UGE8.6.4/bin/lx-amd64/qsub submit_job.sh $SGE_TASK_ID
I tried to modify my code according to some Slurm Documentation but it does not work for Hoffman2 apparently (by adding % I am able to set the number of simultaneous running job).
#$ -cwd
#$ -o test.joblog.LOOP.$JOB_ID
#$ -j y
#$ -l h_data=3G,h_rt=02:00:00
#$ -m n
#$ -t 1-100000%500
echo "STARTING TIME -- $(date) "
echo "$SGE_TASK_ID "
/u/systems/UGE8.6.4/bin/lx-amd64/qsub submit_job.sh $SGE_TASK_ID
Do you know how can I modify my submission Bash code in order to always have 500 running job?
Assuming that your job is visible as
/u/systems/UGE8.6.4/bin/lx-amd64/qsub submit_job.sh
in ps -e, you could try something quick and dirty like:
#!/bin/bash
maxproc=490
while : ; do
qproc=$(ps -e | grep '/u/systems/UGE8.6.4/bin/lx-amd64/qsub submit_job.sh' | wc -l)
if [ "$qproc" -lt $maxproc ] ; then
submission_code #with correct arguments
fi
sleep 10 # or anytime that you feel appropriate
done
Of course, this shows only the principle; you may need to do some testing whether there are more submission-codes; I also assumed he submissioncode self-backgrounds. And many more. But you'll get the idea.
A possible approach (free of busy waiting and ugliness of that kind) is to track the number of jobs on the client side, cap their total count at 500 and, each time any of them finishes, immediately start a new one to replace it. (This is, however, based on the assumption that the client script outlives the jobs.) Concrete steps:
Make the qsub tool block and (passively) wait for the completion of its remote job. Depending on the particular qsub implementation, it may have a -sync flag or something more complex may be needed.
Keep exactly 500, no more and, if possible, no fewer waiting instances of qsub. This can be automated by using this answer or this answer and setting MAX_PARALLELISM to 500 there. qsub itself would be started from the do_something_and_maybe_fail() function.
Here’s a copy&paste of the Bash outline from the answers linked above, just to make this answer more self-contained. Starting with a trivial and runnable harness / dummy example (with a sleep instead of a qsub -sync):
#!/bin/bash
set -euo pipefail
declare -ir MAX_PARALLELISM=500 # pick a limit
declare -i pid
declare -a pids=()
do_something_and_maybe_fail() {
### qsub -sync "$#" ... ### # add the real thing here
sleep $((RANDOM % 10)) # remove this :-)
return $((RANDOM % 2 * 5)) # remove this :-)
}
for pname in some_program_{a..j}{00..60}; do # 600 items
if ((${#pids[#]} >= MAX_PARALLELISM)); then
wait -p pid -n \
&& echo "${pids[pid]} succeeded" 1>&2 \
|| echo "${pids[pid]} failed with ${?}" 1>&2
unset 'pids[pid]'
fi
do_something_and_maybe_fail & # forking here
pids[$!]="${pname}"
echo "${#pids[#]} running" 1>&2
done
for pid in "${!pids[#]}"; do
wait -n "$((pid))" \
&& echo "${pids[pid]} succeeded" 1>&2 \
|| echo "${pids[pid]} failed with ${?}" 1>&2
done
The first loop needs to be adjusted for the specific use case. An example follows, assuming that the right do_something_and_maybe_fail() implementation is in place and that one_command_per_line.txt is a list of arguments for qsub, one invocation per line, with an arbitrary number of lines. (The script could accept a file name as an argument or just read the commands from standard input, whatever works best.) The rest of the script would look exactly like the boilerplate above, keeping the number of parallel qsubs at MAX_PARALLELISM at most.
while read -ra job_args; do
if ((${#pids[#]} >= MAX_PARALLELISM)); then
wait -p pid -n \
&& echo "${pids[pid]} succeeded" 1>&2 \
|| echo "${pids[pid]} failed with ${?}" 1>&2
unset 'pids[pid]'
fi
do_something_and_maybe_fail "${job_args[#]}" & # forking here
pids[$!]="${job_args[*]}"
echo "${#pids[#]} running" 1>&2
done < /path/to/one_command_per_line.txt

Set number of gpus in PBS script from command line

I'm invoking a job with qsub myjob.pbs. In there, I have some logic to run my experiments, which includes running torchrun, a distributed utility for pytorch. In that command you can set the number of nodes and number of processes (+gpus) per node. Depending on the availability, I want to be able to invoke qsub with an arbitrary number of GPUs, so that both -l gpus= and torchrun --nproc_per_node= are set depending on the command line argument.
I tried, the following:
#!/bin/sh
#PBS -l "nodes=1:ppn=12:gpus=$1"
torchrun --standalone --nnodes=1 --nproc_per_node=$1 myscript.py
and invoked it like so:
qsub --pass "4" myjob.pbs
but I got the following error: ERROR: -l: gpus: expected valid integer, found '"$1"'. Is there a way to pass the number of GPUs to the script so that the PBS directives can read them?
The problem is that your shell sees PBS directives as comments, so it will not be able to expand arguments in this way. This means that the expansion of $1 will not be occur using:
#PBS -l "nodes=1:ppn=12:gpus=$1"
Instead, you can apply the -l gpus= argument on the command line and remove the directive from your PBS script. For example:
#!/bin/sh
#PBS -l ncpus=12
set -eu
torchrun \
--standalone \
--nnodes=1 \
--nproc_per_node="${nproc_per_node}" \
myscript.py
Then just use a simple wrapper, e.g. run_myjob.sh:
#!/bin/sh
set -eu
qsub \
-l gpus="$1" \
-v nproc_per_node="$1" \
myjob.pbs
Which should let you specify the number of gpus as a command-line argument:
sh run_myjob.sh 4

include bash script arguments when submitting via bsub

I have the following shell script.
#!/bin/bash --login
#BSUB -q q_ab_mpc_work
#BSUB -J psipred
#BSUB -W 01:00
#BSUB -n 64
#BSUB -o psipred.out
#BSUB -e psipred.err
module load compiler/gnu-4.8.0
module load R/3.0.1
export OMP_NUM_THREADS=4
code=${HOME}/Phd/script_dev/rfpipeline.sh
MYPATH=$HOME/Phd/script_dev/
cd ${MYPATH}
${code} myfile.txt
in which I can use bsub to submit program to cluster:
bsub < myprogram.sh
however I change the last line in my program to:
${code} $1
where I use a command line argument to specify the file, how can I pass this to bsub?
I have tried:
bsub < myprogram.sh myfile.text
however bsub will not accept myfile.text as a bash parameter.
I have also tried
bsub <<< myprogram.sh myfile.text
./myprogram.sh myfile.text | bsub
bsub "sh ./myprogram.sh myfile.text"
what do I need to do?
Can I answer my own question?
It seems that I can use sed to modify the file on the fly. My original file is now:
#!/bin/bash --login
#BSUB -q q_ab_mpc_work
#BSUB -J psipred
#BSUB -W 01:00
#BSUB -n 64
#BSUB -o psipred.out
#BSUB -e psipred.err
module load compiler/gnu-4.8.0
module load R/3.0.1
export OMP_NUM_THREADS=4
code=${HOME}/Phd/script_dev/rfpipeline.sh
MYPATH=$HOME/Phd/script_dev/
cd ${MYPATH}
${code} myfile
and I wrote a bash script, sender.sh to both modify the variable myfile with a command line argument, and send the modified file off to bsub:
#!/bin/bash
sed "s/myfile/$1/g" < myprogram.sh | bsub
being careful to use double quotes so that bash does not read $ literally. I then simply run ./sender.sh jobfile.txt which works!
Hope this helps anybody.
This answer should resolve your problem:
https://unix.stackexchange.com/questions/144518/pass-argument-to-script-then-redirect-script-as-input-to-bsub
Just pass the script with arguments at the end of the bsub command.
Ex.
example.sh
#!/bin/bash
export input=${1}
echo "arg input: ${input}"
bsub command:
bsub [bsub args] "path/to/example.sh arg1"

Use variables as argument parameters inside qsub script

I want to pick up a number of models from a folder and use them in an sge script for an array job. So I do the following in the SGE script:
MODELS=/home/sahil/Codes/bistable/models
numModels=(`ls $MODELS|wc -l`)
echo $numModels
#$ -S /bin/bash
#$ -cwd
#$ -V
#$ -t 1-$[numModels] # Running array job over all files in the models directory.
model=(`ls $MODELS`)
echo "Starting ${model[$SGE_TASK_ID-1]}..."
But I get the following error:
Unable to read script file because of error: Numerical value invalid!
The initial portion of string "$numModels" contains no decimal number
I have also tried to use
#$ -t 1-${numModels}
and
#$ -t 1-(`$numModels`)
but none of these work. Any suggestions/alternate methods are welcome, but they must use the array job functionality of qsub.
Beware that to Bash, #$ -t 1-$[numModels] is nothing more than a comment; hence it does not apply variable expansion to numModels.
One option is to pass the -t argument in the command line: remove it from your script:
#$ -S /bin/bash
#$ -cwd
#$ -V
model=(`ls $MODELS`)
echo "Starting ${model[$SGE_TASK_ID-1]}..."
and submit the script with
MODELS=/home/sahil/Codes/bistable/models qsub -t 1-$(ls $MODELS|wc -l) submit.sh
If you prefer to have a self-contained submission script, another option is to pass the content of the whole script through stdin like this:
#!/bin/bash
qsub <<EOT
MODELS=/home/sahil/Codes/bistable/models
numModels=(`ls $MODELS|wc -l`)
echo $numModels
#$ -S /bin/bash
#$ -cwd
#$ -V
#$ -t 1-$[numModels] # Running array job over all files in the models directory.
model=(`ls $MODELS`)
echo "Starting ${model[$SGE_TASK_ID-1]}..."
EOT
Then you source or execute that script directly to submit your job array (./submit.sh rather than qsub submit.sh as the qsub command is here part of the script.

Directly pass parameters to pbs script

Is there a way to directly pass parameters to a .pbs script before submitting a job? I need to loop over a list of files indicated by different numbers and apply a script to analyze each file.
The best I've been able to come up with is the following:
#!/bin/sh
for ((i= 1; i<= 10; i++))
do
export FILENUM=$i
qsub pass_test.pbs
done
where pass_test.pbs is the following script:
#!/bin/sh
#PBS -V
#PBS -S /bin/sh
#PBS -N pass_test
#PBS -l nodes=1:ppn=1,walltime=00:02:00
#PBS -M XXXXXX#XXX.edu
cd /scratch/XXXXXX/pass_test
./run_test $FILENUM
But this feels a bit wonky. Particularly, I want to avoid having to create an environment variable to handle this.
The qsub utility can read the script from the standard input, so by using a here document you can create scripts on the fly, dynamically:
#!/bin/sh
for i in `seq 1 10`
do
cat <<EOS | qsub -
#!/bin/sh
#PBS -V
#PBS -S /bin/sh
#PBS -N pass_test
#PBS -l nodes=1:ppn=1,walltime=00:02:00
#PBS -M XXXXXX#XXX.edu
cd /scratch/XXXXXX/pass_test
./run_test $i
EOS
done
Personally, I would use a more compact version:
#!/bin/sh
for i in `seq 1 10`
do
cat <<EOS | qsub -V -S /bin/sh -N pass_test -l nodes=1:ppn=1,walltime=00:02:00 -M XXXXXX#XXX.edu -
cd /scratch/XXXXXX/pass_test
./run_test $i
EOS
done
You can use the -F option, as described here:
-F
Specifies the arguments that will be passed to the job script when the script is launched. The accepted syntax is:
qsub -F "myarg1 myarg2 myarg3=myarg3value" myscript2.sh
Note: Quotation marks are required. qsub will fail with an error
message if the argument following -F is not a quoted value. The
pbs_mom server will pass the quoted value as arguments to the job
script when it launches the script.
See also this answer
If you just need to pass numbers and run a list of jobs with the same command except the input file number, it's better to use a job array instead of a for loop as job array would have less burden on the job scheduler.
To run, you specify the file number with PBS_ARRAYID like this in the pbs file:
./run_test ${PBS_ARRAYID}
And to invoke it, on command line, type:
qsub -t 1-10 pass_test.pbs
where you can specify what array id to use after -t option

Resources