Run Julia codes on a cluster - bash

I aim to run some Julia-coded simulations on a cluster (no complicated parallel processing involved) using a .pbs file (and qsub)
I know two ways to run a .jl file from the Bash. The first one is
/path/to/julia myscript.jl
The second one is
exec '/Applications/bla/bla/julia/bin/julia'
include("myscript.jl")
Here is my .pbs file. I cannot test if it works because I don't know yet where the Julia application is stored on the cluster.
#!/bin/bash
#PBS -l procs=1
#PBS -l walltime=240:00:00
#PBS -N Name
#PBS -m ea
#PBS -M name#something.com
#PBS -l pmem=1000mb
#PBS -t 1-3
echo "Starting run at: `date`"
exec '/Applications/bla/bla/julia/bin/julia'
include("myscript.jl")
echo "Job finished with exit code $? at: `date`"
Does it seem correct to you? Or should I, somehow, make an .exec out of my .jl?

You want to directly execute Julia, with your .jl program file as an argument.
Something like:
echo "Starting run at: `date`"
/Applications/bla/bla/julia/bin/julia myscript.jl
echo "Job finished with exit code $? at: `date`"
PBS will catch the standard out and put it in a file such as .pbs.o#### (similarly the standard error in .pbs.e####).
You might find an issue in where your 'present working directory' is when the script runs. Some clusters are setup to 'cd' you to a /tmp/ filesystem, or just drop you in your home directory, rather than being where the script was submitted from.
In that case, the simple solution is to use a full path for the Julia script, but this makes it difficult to reuse your PBS submission script.
/Applications/bla/bla/julia/bin/julia ~/mydirectory/myscript.jl

Related

Read job name from bash script parameters in SGE

I am running Sun Grid Engine for submitting jobs, and I want to have a bash script that sends in any file I need to run, instead of having to run a different qsub command with a different bash file for each of the jobs. I have been capable of generating output and error files that share the name of the input file, but now I am struggling with setting a different name for each file. My approach has been the following:
#!/bin/bash
#
#$ -cwd
#$ -S /bin/bash
#$ -N $1
#
python -u $1 >/output_dir/$1.out 2>/error_dir/$1.error
This way, running qsub send_to_sge.sh foo executes the program, and creates the files foo.error and foo.out with the errors and printouts, respectively. However, the job appears with the name $1 in the SGE queue. Instead, I would like to have foo as the job name. Is there any way to achieve what I am seeking?

Submit job with python code (mpi4py) on HPC cluster

I am working a python code with MPI (mpi4py) and I want to implement my code across many nodes (each node has 16 processors) in a queue in a HPC cluster.
My code is structured as below:
from mpi4py import MPI
comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()
count = 0
for i in range(1, size):
if rank == i:
for j in range(5):
res = some_function(some_argument)
comm.send(res, dest=0, tag=count)
I am able to run this code perfectly fine on the head node of the cluster using the command
$mpirun -np 48 python codename.py
Here "code" is the name of the python script and in the given example, I am choosing 48 processors. On the head node, for my specific task, the job takes about 1 second to finish (and it successfully gives the desired output).
However, when I run try to submit this same exact code as a job on one of the queues of the HPC cluster, it keeps running for a very long time (many hours) (doesn't finish) and I have to manually kill the job after a day or so. Also, it doesn't give the expected output.
Here is the pbs file that I am using,
#!/bin/sh
#PBS -l nodes=3:ppn=16
#PBS -N phy
#PBS -m abe
#PBS -l walltime=23:00:00
#PBS -j eo
#PBS -q queue_name
cd $PBS_O_WORKDIR
echo 'This job started on: ' `date`
module load python27-extras
mpirun -np 48 python codename.py
I use the command qsub jobname.pbs to submit the job.
I am confused as to why the code should run perfectly fine on the head node, but run into this problem when I submit this job to run the code across many processors in a queue. I am presuming that I may need to change the pbs script. I will be really thankful if someone can suggest what I should do to run such a MPI script as a job on a queue in a HPC cluster.
Didn't need to change my code. This is the pbs script that worked. =)
Apparently, I needed to call the appropriate mpirun in the job script, so that when the code runs in the clusters, it uses the same mpirun as that was being used in head node.
This is the line which made the difference: /opt/intel/impi/4.1.1.036/intel64/bin/mpirun
This is the job script which worked.
#!/bin/sh
#PBS -l nodes=3:ppn=16
#PBS -N phy
#PBS -m abe
#PBS -l walltime=23:00:00
#PBS -j eo
#PBS -q queue_name
cd $PBS_O_WORKDIR
export OMP_NUM_THREADS=16
export I_MPI_PIN=off
echo 'This job started on: ' `date`
/opt/intel/impi/4.1.1.036/intel64/bin/mpirun -np 48 python codename.py

Hold remainder of shell script commands until PBS qsub array job completes

I am very new to shell scripting, and I am trying to write a shell pipeline that submits multiple qsub jobs, but has several commands to run in between these qsubs, which are contingent on the most recent job completing. I have been researching multiple ways to try and hold the shell script from proceeding after submission of a qsub job, but none have been successful.
The simplest chunk of code I can provide to illustrate the issue is as follows:
THREADS=`wc -l < list1.txt`
qsub -V -t 1-$THREADS firstjob.sh
echo "firstjob.sh completed"
There are obviously other lines of code after this that are actually contingent on firstjob.sh finishing, but I have omitted them here for clarity. I have tried the following methods of pausing/holding the script:
1) Only using wait, which is supposed to stop the script until all background programs are completed. This pushed right past the wait and printed the echo statement to the terminal while the array job was still running. My guess is this is occurring because once the qsub job is submitted, is exits and wait thinks it has completed?
qsub -V -t 1-$THREADS firstjob.sh
wait
echo "firstjob.sh completed"
2) Setting the job to a variable, echoing that variable to submit the job, and using the the entire job ID along with wait to pause. The echo command should wait until all elements of the array job have completed.The error message is shown following the code, within the code block.
job1=$(qsub -V -t 1-$THREADS firstjob.sh)
echo "$job1"
wait $job1
echo "firstjob.sh completed"
####ERROR RECEIVED####
-bash: wait: `4585057[].cluster-name.local': not a pid or valid job spec
3) Using the -sync y for qsub. This should prevent it from exiting the qsub until the job is complete, acting as an effective pause...I had hoped. Error in comment after the commands. For some reason it is not reading the -sync option correctly?
qsub -V -sync y -t 1-$THREADS firstjob.sh
echo "firstjob.sh completed"
####ERROR RECEIVED####
qsub: script file 'y' cannot be loaded - No such file or directory
4) Using a dummy shell script (the dummy just makes an empty file) so that I could use the -W depend=afterok: option of qsub to pause the script. This again pushes right past to the echo statement without any pause for submitting the dummy script. Both jobs get submitted, one right after the other, no pause.
job1=$(qsub -V -t 1-$THREADS demux.sh)
echo "$job1"
check=$(qsub -V -W depend=afterok:$job1 dummy.sh)
echo "$check"
echo "firstjob.sh completed"
Some further details regarding the script:
Each job submission is an array job.
The pipeline is being run in the terminal using a command resembling the following, so that I may provide it with 3 inputs: source Pipeline.sh -r list1.txt -d /workingDir/ -s list2.txt
I am certain that the firstjob.sh has not actually completed running because I see them in the queue when I use showq.
Perhaps there is an easy fix in most of these scenarios, but being new to all this, I am really struggling. I have to use this method in 8-10 places throughout the script, so it is really hindering progress. Would appreciate any assistance. Thanks.
POST EDIT 1
Here is the code contained in firstjob.sh...though doubtful that it will help. Everything in here functions as expected, always produces the correct results.
\#! /bin/bash
\#PBS -S /bin/bash
\#PBS -N demux
\#PBS -l walltime=72:00:00
\#PBS -j oe
\#PBS -l nodes=1:ppn=4
\#PBS -l mem=15gb
module load biotools
cd ${WORKDIR}/rawFQs/
INFILE=`head -$PBS_ARRAYID ${WORKDIR}${RAWFQ} | tail -1`
BASE=`basename "$INFILE" .fq.gz`
zcat $INFILE | fastx_barcode_splitter.pl --bcfile ${WORKDIR}/rawFQs/DemuxLists/${BASE}_sheet4splitter.txt --prefix ${WORKDIR}/fastqs/ --bol --suffix ".fq"
I just tried using -sync y, and that worked for me, so good idea there... Not sure what's different about your setup.
But a couple other things you could try involve your main script knowing the status of the qsub jobs you're running. One idea is that you could have your main script check the status of your job using qstat and wait until it finishes before proceeding.
Alternatively, you could have the first job write to a file as its last step (or, as you suggested, set up a dummy job that waits for the first job to finish). Then in your main script, you can test to see whether that file has been written before going on.

How can I send a batch job to PBS using a function in Shell?

I can submit a job to PBS using both approaches of Non-interactive Batch Jobs and/or Interactive Batch Jobs. However, I need to use the pbs commands in a function. In other world I need a structure like this:
#!/bin/sh
pbs_setup () {
#PBS -l $1
#PBS -N $2
#PBS -q normal
#PBS -A $USER
#PBS -m ae
#PBS -M $USER"#gmail.com"
#PBS -q normal
#PBS -l nodes=1:ppn=8
#PBS
}
pbs_setup "walltime=6:00:00" "step3";
echo " "
echo "Job started
echo " "
echo "Job Ended
When I am submitting this job it is not working.
In fact my final goal is separating the commands of job from the main body of code. So when HPC will be changed I just edit a shell file which is included this function instead of editing all the shells. I appreciate if you give me some suggestions.
You could create your custom submission command that collects the job options and sends them as command line parameters to actual qsub call.
Here is a rather basic example of this. In real usage I would add more sophisticated parameter handling tailored to the type of jobs, and more consistent with qsub interface. Also handling interactive jobs needs additional work.
submit.sh
#!/bin/bash
walltime="${2:-06:00:00}"
name="${3:-step3}"
queue="normal"
acct="$USER"
mailevents="ae"
mailaddress="$USER#gmail.com"
resources="nodes=1:ppn=8"
if [ $# -lt 1 ] ; then
echo "Usage: submit.sh script [walltime [name]]" >
exit 1
fi
script="$1"
qsub -l "$walltime" -N "$name" -q "$queue" -A "$acct" \
-m "$mailevents" -M "$mailaddress" -l "$resources" "$script"
script.sh
#!/bin/bash
echo " "
echo "Job started"
echo " "
echo "Job Ended"
This is supposed to be used as
submit.sh script.sh 06:00:00 step3
The issue with that job script is that the #PBS lines need to be first non-comment lines in the script file.
In my attempt to do this same concept, I used the same type of function you have, but cat the results and the actual commands into another file. i.e. An overarching script creates the 'job' script. You can put the HPC requirements in a separate file, then source it from the creation script.
Edit in response to comment:
e.g.
To specify a path to start the job from:
#PBS - d init_path
"working directory path to be used for the job, PBS_O_INITDIR"
Or
#PBS -D root_path
"root directory to be used for the job, PBS_O_ROOTDIR."
Or
#PBS -w working_path
"If the -w option is not specified, the default working directory is the current directory. This option sets the environment variable PBS_O_WORKDIR."
So the default PBS_O_WORKDIR is the current directory you are IN when you call the script to submit the script to qsub.
Thus, if you set the specific options (d, D, w) for paths relative to the actual script running environment, you'll be able to use the paths you intend.
For specifics including default values of these and other options, you can check out the man page for your app. If using the Torque version of the PBS system, it's available at linux.die.net - qsub

Directly pass parameters to pbs script

Is there a way to directly pass parameters to a .pbs script before submitting a job? I need to loop over a list of files indicated by different numbers and apply a script to analyze each file.
The best I've been able to come up with is the following:
#!/bin/sh
for ((i= 1; i<= 10; i++))
do
export FILENUM=$i
qsub pass_test.pbs
done
where pass_test.pbs is the following script:
#!/bin/sh
#PBS -V
#PBS -S /bin/sh
#PBS -N pass_test
#PBS -l nodes=1:ppn=1,walltime=00:02:00
#PBS -M XXXXXX#XXX.edu
cd /scratch/XXXXXX/pass_test
./run_test $FILENUM
But this feels a bit wonky. Particularly, I want to avoid having to create an environment variable to handle this.
The qsub utility can read the script from the standard input, so by using a here document you can create scripts on the fly, dynamically:
#!/bin/sh
for i in `seq 1 10`
do
cat <<EOS | qsub -
#!/bin/sh
#PBS -V
#PBS -S /bin/sh
#PBS -N pass_test
#PBS -l nodes=1:ppn=1,walltime=00:02:00
#PBS -M XXXXXX#XXX.edu
cd /scratch/XXXXXX/pass_test
./run_test $i
EOS
done
Personally, I would use a more compact version:
#!/bin/sh
for i in `seq 1 10`
do
cat <<EOS | qsub -V -S /bin/sh -N pass_test -l nodes=1:ppn=1,walltime=00:02:00 -M XXXXXX#XXX.edu -
cd /scratch/XXXXXX/pass_test
./run_test $i
EOS
done
You can use the -F option, as described here:
-F
Specifies the arguments that will be passed to the job script when the script is launched. The accepted syntax is:
qsub -F "myarg1 myarg2 myarg3=myarg3value" myscript2.sh
Note: Quotation marks are required. qsub will fail with an error
message if the argument following -F is not a quoted value. The
pbs_mom server will pass the quoted value as arguments to the job
script when it launches the script.
See also this answer
If you just need to pass numbers and run a list of jobs with the same command except the input file number, it's better to use a job array instead of a for loop as job array would have less burden on the job scheduler.
To run, you specify the file number with PBS_ARRAYID like this in the pbs file:
./run_test ${PBS_ARRAYID}
And to invoke it, on command line, type:
qsub -t 1-10 pass_test.pbs
where you can specify what array id to use after -t option

Resources