Read job name from bash script parameters in SGE - bash

I am running Sun Grid Engine for submitting jobs, and I want to have a bash script that sends in any file I need to run, instead of having to run a different qsub command with a different bash file for each of the jobs. I have been capable of generating output and error files that share the name of the input file, but now I am struggling with setting a different name for each file. My approach has been the following:
#!/bin/bash
#
#$ -cwd
#$ -S /bin/bash
#$ -N $1
#
python -u $1 >/output_dir/$1.out 2>/error_dir/$1.error
This way, running qsub send_to_sge.sh foo executes the program, and creates the files foo.error and foo.out with the errors and printouts, respectively. However, the job appears with the name $1 in the SGE queue. Instead, I would like to have foo as the job name. Is there any way to achieve what I am seeking?

Related

qsub is executing my bash script in csh despite shebang

I want to submit a bash script to my university's Sungrid computing cluster to run an executable in a loop. When I log in to the server, I'm in bash:
$ echo $SHELL
/bin/bash
And I include a bash shebang at the top of the script that I pass to qsub:
$ cat shell_sub
#!/bin/bash
#$ -N bSS_s13
#$ -o logs/bSS_s13.log
#$ -j y
#$ -cwd
echo $SHELL > shell.txt
But when I submit the above script:
qsub shell_sub
It instead executes in csh:
$ cat shell.txt
/bin/csh
How can I force qsub to execute my script with bash instead of csh?
Most likely your queue is configured with shell_start_mode as posix_compliant and the defined shell is listed as /bin/csh (which is the default). To check this:
$ qconf -sq <name-of-queue> | grep shell
shell /bin/bash
shell_start_mode unix_behavior
If you don't know the name of your queue, it's probably all.q.
If shell_start_mode is posix_compliant, then the shebang line is ignored and the job (if it's not submitted as binary: -b y) is started with the shell defined by the shell setting.
Why? From the man page: "POSIX does not consider first script line comments such a ‘#!/bin/csh’ as being significant. The POSIX standard for batch queuing systems (P1003.2d) therefore requires a compliant queuing system to ignore such lines but to use user specified or configured default command interpreters instead."
If shell_start_mode is unix_behavior, then the shebang line is used to determine the shell for the job.
You can ask your administrator to consider changing the queue settings.
You can set the shell for a submitted job (at least in Torque) using -S.
For example: qsub shell_sub -S /bin/bash

How can I send a batch job to PBS using a function in Shell?

I can submit a job to PBS using both approaches of Non-interactive Batch Jobs and/or Interactive Batch Jobs. However, I need to use the pbs commands in a function. In other world I need a structure like this:
#!/bin/sh
pbs_setup () {
#PBS -l $1
#PBS -N $2
#PBS -q normal
#PBS -A $USER
#PBS -m ae
#PBS -M $USER"#gmail.com"
#PBS -q normal
#PBS -l nodes=1:ppn=8
#PBS
}
pbs_setup "walltime=6:00:00" "step3";
echo " "
echo "Job started
echo " "
echo "Job Ended
When I am submitting this job it is not working.
In fact my final goal is separating the commands of job from the main body of code. So when HPC will be changed I just edit a shell file which is included this function instead of editing all the shells. I appreciate if you give me some suggestions.
You could create your custom submission command that collects the job options and sends them as command line parameters to actual qsub call.
Here is a rather basic example of this. In real usage I would add more sophisticated parameter handling tailored to the type of jobs, and more consistent with qsub interface. Also handling interactive jobs needs additional work.
submit.sh
#!/bin/bash
walltime="${2:-06:00:00}"
name="${3:-step3}"
queue="normal"
acct="$USER"
mailevents="ae"
mailaddress="$USER#gmail.com"
resources="nodes=1:ppn=8"
if [ $# -lt 1 ] ; then
echo "Usage: submit.sh script [walltime [name]]" >
exit 1
fi
script="$1"
qsub -l "$walltime" -N "$name" -q "$queue" -A "$acct" \
-m "$mailevents" -M "$mailaddress" -l "$resources" "$script"
script.sh
#!/bin/bash
echo " "
echo "Job started"
echo " "
echo "Job Ended"
This is supposed to be used as
submit.sh script.sh 06:00:00 step3
The issue with that job script is that the #PBS lines need to be first non-comment lines in the script file.
In my attempt to do this same concept, I used the same type of function you have, but cat the results and the actual commands into another file. i.e. An overarching script creates the 'job' script. You can put the HPC requirements in a separate file, then source it from the creation script.
Edit in response to comment:
e.g.
To specify a path to start the job from:
#PBS - d init_path
"working directory path to be used for the job, PBS_O_INITDIR"
Or
#PBS -D root_path
"root directory to be used for the job, PBS_O_ROOTDIR."
Or
#PBS -w working_path
"If the -w option is not specified, the default working directory is the current directory. This option sets the environment variable PBS_O_WORKDIR."
So the default PBS_O_WORKDIR is the current directory you are IN when you call the script to submit the script to qsub.
Thus, if you set the specific options (d, D, w) for paths relative to the actual script running environment, you'll be able to use the paths you intend.
For specifics including default values of these and other options, you can check out the man page for your app. If using the Torque version of the PBS system, it's available at linux.die.net - qsub

Get SGE jobid to make a pipeline

Suppose I want to write a pipeline of tasks to submit to Sun/Oracle Grid Engine.
qsub -cwd touch a.txt
qsub -cwd -hold_jid touch wc -l a.txt
Now, this will run the 2nd job (wc) only after the first job (touch) is done. However, if a previous job with the name touch had run earlier, the 2nd job won't be held since the condition is already satisfied. I need the jobid of the first job.
I tried
myjid=`qsub -cwd touch a.txt`
But it gave $ echo $myjid
Your job 1062487 ("touch") has been submitted
You just need to add the -terse option to the first qsub so that it only displays the jobid rather than the whole string.
JID=`qsub -terse -cwd touch a.txt`

Pass command line arguments via sbatch

Suppose that I have the following simple bash script which I want to submit to a batch server through SLURM:
#!/bin/bash
#SBATCH -o "outFile"$1".txt"
#SBATCH -e "errFile"$1".txt"
hostname
exit 0
In this script, I simply want to write the output of hostname on a textfile whose full name I control via the command-line, like so:
login-2:jobs$ sbatch -D `pwd` exampleJob.sh 1
Submitted batch job 203775
Unfortunately, it seems that my last command-line argument (1) is not parsed through sbatch, since the files created do not have the suffix I'm looking for and the string "$1" is interpreted literally:
login-2:jobs$ ls
errFile$1.txt exampleJob.sh outFile$1.txt
I've looked around places in SO and elsewhere, but I haven't had any luck. Essentially what I'm looking for is the equivalent of the -v switch of the qsub utility in Torque-enabled clusters.
Edit: As mentioned in the underlying comment thread, I solved my problem the hard way: instead of having one single script that would be submitted several times to the batch server, each with different command line arguments, I created a "master script" that simply echoed and redirected the same content onto different scripts, the content of each being changed by the command line parameter passed. Then I submitted all of those to my batch server through sbatch. However, this does not answer the original question, so I hesitate to add it as an answer to my question or mark this question solved.
I thought I'd offer some insight because I was also looking for the replacement to the -v option in qsub, which for sbatch can be accomplished using the --export option. I found a nice site here that shows a list of conversions from Torque to Slurm, and it made the transition much smoother.
You can specify the environment variable ahead of time in your bash script:
$ var_name='1'
$ sbatch -D `pwd` exampleJob.sh --export=var_name
Or define it directly within the sbatch command just like qsub allowed:
$ sbatch -D `pwd` exampleJob.sh --export=var_name='1'
Whether this works in the # preprocessors of exampleJob.sh is also another question, but I assume that it should give the same functionality found in Torque.
Using a wrapper is more convenient. I found this solution from this thread.
Basically the problem is that the SBATCH directives are seen as comments by the shell and therefore you can't use the passed arguments in them. Instead you can use a here document to feed in your bash script after the arguments are set accordingly.
In case of your question you can substitute the shell script file with this:
#!/bin/bash
sbatch <<EOT
#!/bin/bash
#SBATCH -o "outFile"$1".txt"
#SBATCH -e "errFile"$1".txt"
hostname
exit 0
EOT
And you run the shell script like this:
bash [script_name].sh [suffix]
And the outputs will be saved to outFile[suffix].txt and errFile[suffix].txt
If you pass your commands via the command line, you can actually bypass the issue of not being able to pass command line arguments in the batch script. So for instance, at the command line :
var1="my_error_file.txt"
var2="my_output_file.txt"
sbatch --error=$var1 --output=$var2 batch_script.sh
The lines starting with #SBATCH are not interpreted by bash but are replaced with code by sbatch.
The sbatch options do not support $1 vars (only %j and some others, replacing $1 by %1 will not work).
When you don't have different sbatch processes running in parallel, you could try
#!/bin/bash
touch outFile${1}.txt errFile${1}.txt
rm link_out.sbatch link_err.sbatch 2>/dev/null # remove links from previous runs
ln -s outFile${1}.txt link_out.sbatch
ln -s errFile${1}.txt link_err.sbatch
#SBATCH -o link_out.sbatch
#SBATCH -e link_err.sbatch
hostname
# I do not know about the background processing of sbatch, are the jobs still running
# at this point? When they are, you can not delete the temporary symlinks yet.
exit 0
Alternative:
As you said in a comment yourself, you could make a masterscript.
This script can contain lines like
cat exampleJob.sh.template | sed -e 's/File.txt/File'$1'.txt/' > exampleJob.sh
# I do not know, is the following needed with sbatch?
chmod +x exampleJob.sh
In your template the #SBATCH lines look like
#SBATCH -o "outFile.txt"
#SBATCH -e "errFile.txt"
This is an old question but I just stumbled into the same task and I think this solution is simpler:
Let's say I have the variable $OUT_PATH in the bash script launch_analysis.bash and I want to pass this variable to task_0_generate_features.sl which is my SLURM file to send the computation to a batch server. I would have the following in launch_analysis.bash:
`sbatch --export=OUT_PATH=$OUT_PATH task_0_generate_features.sl`
Which is directly accessible in task_0_generate_features.sl
In #Jason case we would have:
sbatch -D `pwd` --export=hostname=$hostname exampleJob.sh
Reference: Using Variables in SLURM Jobs
Something like this works for me and Torque
echo "$(pwd)/slurm.qsub 1" | qsub -S /bin/bash -N Slurm-TEST
slurm.qsub:
#!/bin/bash
hostname > outFile${1}.txt 2>errFile${1}.txt
exit 0

Run Julia codes on a cluster

I aim to run some Julia-coded simulations on a cluster (no complicated parallel processing involved) using a .pbs file (and qsub)
I know two ways to run a .jl file from the Bash. The first one is
/path/to/julia myscript.jl
The second one is
exec '/Applications/bla/bla/julia/bin/julia'
include("myscript.jl")
Here is my .pbs file. I cannot test if it works because I don't know yet where the Julia application is stored on the cluster.
#!/bin/bash
#PBS -l procs=1
#PBS -l walltime=240:00:00
#PBS -N Name
#PBS -m ea
#PBS -M name#something.com
#PBS -l pmem=1000mb
#PBS -t 1-3
echo "Starting run at: `date`"
exec '/Applications/bla/bla/julia/bin/julia'
include("myscript.jl")
echo "Job finished with exit code $? at: `date`"
Does it seem correct to you? Or should I, somehow, make an .exec out of my .jl?
You want to directly execute Julia, with your .jl program file as an argument.
Something like:
echo "Starting run at: `date`"
/Applications/bla/bla/julia/bin/julia myscript.jl
echo "Job finished with exit code $? at: `date`"
PBS will catch the standard out and put it in a file such as .pbs.o#### (similarly the standard error in .pbs.e####).
You might find an issue in where your 'present working directory' is when the script runs. Some clusters are setup to 'cd' you to a /tmp/ filesystem, or just drop you in your home directory, rather than being where the script was submitted from.
In that case, the simple solution is to use a full path for the Julia script, but this makes it difficult to reuse your PBS submission script.
/Applications/bla/bla/julia/bin/julia ~/mydirectory/myscript.jl

Resources