Bash script error to run MATLAB - bash

I'm trying to run a matlab script (ga_opt_main.m) in a cluster. I have to write a job submission file, which is essentially just a shell script. But I have never written a shell script and this is what I wrote
#!/bin/bash
#PBS -q *queuename*
#PBS -l nodes=1:ppn=20
#PBS -l walltime=02:00:00
#PBS -N ga_opt_main
module load matlab/R2011b
module list
unset DISPLAY
matlab -nodisplay -nodesktop -r *directory path/ga_opt_main.m*
MATLAB opens in the background but my job is not run. Instead I get an error file saying
bash: -c: line 0: syntax error in conditional expression
bash: -c: line 0: syntax error near `fraction'
Any ideas on why this occurs and how it can be avoided?
Thanks!

I've never used PBS before, but to run a MATLAB script from the shell, try the following:
matlab -nodesktop -nodisplay -r "addpath('/directory/path'); ga_opt_main; quit;"
where ga_opt_main.m is the name of the script file, and '/directory/path' is the directory where it resides. Note that you must have any other dependencies to this script on the MATLAB path as well.
There is also a convenient RUN function that does something similar:
matlab ... -r "run('/directory/path/ga_opt_main.m'); quit;"

###############################
#!/bin/sh
#PBS -l nodes=1
#PBS -l walltime=2:0:0
#PBS -j oe
#PBS -o localhost:/dev/null
#PBS -d /your/working/directory
cd $PBS_O_WORKDIR
matlab -nodisplay -nodesktop -nojvm -nosplash -r "your_matlab_function"
I like to add addpath(genpath('~/your/script's/home')); to the actual matlab script/function. Also, do not add the ".m" to your matlab filename.

Related

Set number of gpus in PBS script from command line

I'm invoking a job with qsub myjob.pbs. In there, I have some logic to run my experiments, which includes running torchrun, a distributed utility for pytorch. In that command you can set the number of nodes and number of processes (+gpus) per node. Depending on the availability, I want to be able to invoke qsub with an arbitrary number of GPUs, so that both -l gpus= and torchrun --nproc_per_node= are set depending on the command line argument.
I tried, the following:
#!/bin/sh
#PBS -l "nodes=1:ppn=12:gpus=$1"
torchrun --standalone --nnodes=1 --nproc_per_node=$1 myscript.py
and invoked it like so:
qsub --pass "4" myjob.pbs
but I got the following error: ERROR: -l: gpus: expected valid integer, found '"$1"'. Is there a way to pass the number of GPUs to the script so that the PBS directives can read them?
The problem is that your shell sees PBS directives as comments, so it will not be able to expand arguments in this way. This means that the expansion of $1 will not be occur using:
#PBS -l "nodes=1:ppn=12:gpus=$1"
Instead, you can apply the -l gpus= argument on the command line and remove the directive from your PBS script. For example:
#!/bin/sh
#PBS -l ncpus=12
set -eu
torchrun \
--standalone \
--nnodes=1 \
--nproc_per_node="${nproc_per_node}" \
myscript.py
Then just use a simple wrapper, e.g. run_myjob.sh:
#!/bin/sh
set -eu
qsub \
-l gpus="$1" \
-v nproc_per_node="$1" \
myjob.pbs
Which should let you specify the number of gpus as a command-line argument:
sh run_myjob.sh 4

How can I send a batch job to PBS using a function in Shell?

I can submit a job to PBS using both approaches of Non-interactive Batch Jobs and/or Interactive Batch Jobs. However, I need to use the pbs commands in a function. In other world I need a structure like this:
#!/bin/sh
pbs_setup () {
#PBS -l $1
#PBS -N $2
#PBS -q normal
#PBS -A $USER
#PBS -m ae
#PBS -M $USER"#gmail.com"
#PBS -q normal
#PBS -l nodes=1:ppn=8
#PBS
}
pbs_setup "walltime=6:00:00" "step3";
echo " "
echo "Job started
echo " "
echo "Job Ended
When I am submitting this job it is not working.
In fact my final goal is separating the commands of job from the main body of code. So when HPC will be changed I just edit a shell file which is included this function instead of editing all the shells. I appreciate if you give me some suggestions.
You could create your custom submission command that collects the job options and sends them as command line parameters to actual qsub call.
Here is a rather basic example of this. In real usage I would add more sophisticated parameter handling tailored to the type of jobs, and more consistent with qsub interface. Also handling interactive jobs needs additional work.
submit.sh
#!/bin/bash
walltime="${2:-06:00:00}"
name="${3:-step3}"
queue="normal"
acct="$USER"
mailevents="ae"
mailaddress="$USER#gmail.com"
resources="nodes=1:ppn=8"
if [ $# -lt 1 ] ; then
echo "Usage: submit.sh script [walltime [name]]" >
exit 1
fi
script="$1"
qsub -l "$walltime" -N "$name" -q "$queue" -A "$acct" \
-m "$mailevents" -M "$mailaddress" -l "$resources" "$script"
script.sh
#!/bin/bash
echo " "
echo "Job started"
echo " "
echo "Job Ended"
This is supposed to be used as
submit.sh script.sh 06:00:00 step3
The issue with that job script is that the #PBS lines need to be first non-comment lines in the script file.
In my attempt to do this same concept, I used the same type of function you have, but cat the results and the actual commands into another file. i.e. An overarching script creates the 'job' script. You can put the HPC requirements in a separate file, then source it from the creation script.
Edit in response to comment:
e.g.
To specify a path to start the job from:
#PBS - d init_path
"working directory path to be used for the job, PBS_O_INITDIR"
Or
#PBS -D root_path
"root directory to be used for the job, PBS_O_ROOTDIR."
Or
#PBS -w working_path
"If the -w option is not specified, the default working directory is the current directory. This option sets the environment variable PBS_O_WORKDIR."
So the default PBS_O_WORKDIR is the current directory you are IN when you call the script to submit the script to qsub.
Thus, if you set the specific options (d, D, w) for paths relative to the actual script running environment, you'll be able to use the paths you intend.
For specifics including default values of these and other options, you can check out the man page for your app. If using the Torque version of the PBS system, it's available at linux.die.net - qsub

Run Julia codes on a cluster

I aim to run some Julia-coded simulations on a cluster (no complicated parallel processing involved) using a .pbs file (and qsub)
I know two ways to run a .jl file from the Bash. The first one is
/path/to/julia myscript.jl
The second one is
exec '/Applications/bla/bla/julia/bin/julia'
include("myscript.jl")
Here is my .pbs file. I cannot test if it works because I don't know yet where the Julia application is stored on the cluster.
#!/bin/bash
#PBS -l procs=1
#PBS -l walltime=240:00:00
#PBS -N Name
#PBS -m ea
#PBS -M name#something.com
#PBS -l pmem=1000mb
#PBS -t 1-3
echo "Starting run at: `date`"
exec '/Applications/bla/bla/julia/bin/julia'
include("myscript.jl")
echo "Job finished with exit code $? at: `date`"
Does it seem correct to you? Or should I, somehow, make an .exec out of my .jl?
You want to directly execute Julia, with your .jl program file as an argument.
Something like:
echo "Starting run at: `date`"
/Applications/bla/bla/julia/bin/julia myscript.jl
echo "Job finished with exit code $? at: `date`"
PBS will catch the standard out and put it in a file such as .pbs.o#### (similarly the standard error in .pbs.e####).
You might find an issue in where your 'present working directory' is when the script runs. Some clusters are setup to 'cd' you to a /tmp/ filesystem, or just drop you in your home directory, rather than being where the script was submitted from.
In that case, the simple solution is to use a full path for the Julia script, but this makes it difficult to reuse your PBS submission script.
/Applications/bla/bla/julia/bin/julia ~/mydirectory/myscript.jl

sleep command not found in torque pbs but works in shell

We create a torque pbs file "testpbs" as follows:
#!/bin/sh
#PBS -N T1272_flt
#PBS -q batch
#PBS -l nodes=1:ppn=1
#PBS -o /data/software/torque-4.2.6.1/testpbs.sh.out
#PBS -e /data/software/torque-4.2.6.1/testpbs.sh.err
sleep 20
Then submitted the file testbps.
qsub testpbs
We got error messages:
more testpbs.sh.err
/var/spool/torque/mom_priv/jobs/8.centos64.SC: line 9: sleep: command
not found
However, we ran sleep 20 in command line. No error occurs.
$sleep 20
Thanks in advance.
We ran echo $PATH in shell and got the following:
echo $PATH
/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/X11R6/bin:/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.9.x86_64/bin:/data/software/cufflinks-2.0.2.Linux_x86_64:/home/amin/bin/blast-2.2.19:/root/bin:/home/amin/bin
We use qstat -f jobid to review the details of this job.
PBS_O_LOGNAME=amin,
PBS_O_PATH= /usr/lib64/qt-3.3/bin: /usr/local/sbin: /usr/local/bin:
/sbin: /bin: /usr/sbin: /usr/bin: /sbin:/bin: /usr/sbin: /usr/bin:
/usr/X11R6/bin: /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.9.x86_64/bin:
/data/software/cufflinks-2.0.2.Linux_x86_64:
/home/amin/bin/blast-2.2.19: /root/bin: /home/aimin/bin,
PBS_O_MAIL=/var/spool/mail/root,
PBS_O_SHELL=/bin/bash,
PBS_O_LANG=en_US.UTF-8,
PBS_O_WORKDIR=/data/software/torque-4.2.6.1,
PBS_O_HOST=centos64,
PBS_O_SERVER=centos64
Thank larsks's great help. The following works:
#!/bin/sh
#PBS -N T1272_flt
#PBS -q batch
#PBS -l nodes=1:ppn=1
#PBS -o /data/software/torque-4.2.6.1/testpbs.sh.out
#PBS -e /data/software/torque-4.2.6.1/testpbs.sh.err
export PATH=$PBS_O_PATH
sleep 20
Try replacing sleep with the full path to the command (possibly /usr/bin/sleep) and see if that changes the behavior. If it does, then your script, when run under Torque, simply has a different (or empty) $PATH variable.
You can either (a) continue to use explicit paths, or (b) set $PATH explicitly in your script, e.g:
PATH=/bin:/usr/bin:/usr/local/bin

Directly pass parameters to pbs script

Is there a way to directly pass parameters to a .pbs script before submitting a job? I need to loop over a list of files indicated by different numbers and apply a script to analyze each file.
The best I've been able to come up with is the following:
#!/bin/sh
for ((i= 1; i<= 10; i++))
do
export FILENUM=$i
qsub pass_test.pbs
done
where pass_test.pbs is the following script:
#!/bin/sh
#PBS -V
#PBS -S /bin/sh
#PBS -N pass_test
#PBS -l nodes=1:ppn=1,walltime=00:02:00
#PBS -M XXXXXX#XXX.edu
cd /scratch/XXXXXX/pass_test
./run_test $FILENUM
But this feels a bit wonky. Particularly, I want to avoid having to create an environment variable to handle this.
The qsub utility can read the script from the standard input, so by using a here document you can create scripts on the fly, dynamically:
#!/bin/sh
for i in `seq 1 10`
do
cat <<EOS | qsub -
#!/bin/sh
#PBS -V
#PBS -S /bin/sh
#PBS -N pass_test
#PBS -l nodes=1:ppn=1,walltime=00:02:00
#PBS -M XXXXXX#XXX.edu
cd /scratch/XXXXXX/pass_test
./run_test $i
EOS
done
Personally, I would use a more compact version:
#!/bin/sh
for i in `seq 1 10`
do
cat <<EOS | qsub -V -S /bin/sh -N pass_test -l nodes=1:ppn=1,walltime=00:02:00 -M XXXXXX#XXX.edu -
cd /scratch/XXXXXX/pass_test
./run_test $i
EOS
done
You can use the -F option, as described here:
-F
Specifies the arguments that will be passed to the job script when the script is launched. The accepted syntax is:
qsub -F "myarg1 myarg2 myarg3=myarg3value" myscript2.sh
Note: Quotation marks are required. qsub will fail with an error
message if the argument following -F is not a quoted value. The
pbs_mom server will pass the quoted value as arguments to the job
script when it launches the script.
See also this answer
If you just need to pass numbers and run a list of jobs with the same command except the input file number, it's better to use a job array instead of a for loop as job array would have less burden on the job scheduler.
To run, you specify the file number with PBS_ARRAYID like this in the pbs file:
./run_test ${PBS_ARRAYID}
And to invoke it, on command line, type:
qsub -t 1-10 pass_test.pbs
where you can specify what array id to use after -t option

Resources