PBS torque email variable - cluster-computing

Here is an example of pbs that I use:
#!/bin/bash
#PBS -S /bin/bash
#PBS -N myJob
#PBS -l nodes=1:ppn=4
#PBS -l walltime=50:00:00
#PBS -q route
export MYMAIL=mytestmail#testmail.com
#PBS -m ae
#PBS -M mytestmail#testmail.com
./script1.sh
echo $PBS_JOBID $PBS_O_WORKDIR | mail -s "$PBS_JOBNAME script1 done" $MYMAIL
./script2.sh
echo $PBS_JOBID $PBS_O_WORKDIR | mail -s "$PBS_JOBNAME script2 done" $MYMAIL
./script3.sh
echo $PBS_JOBID $PBS_O_WORKDIR | mail -s "$PBS_JOBNAME script3 done" $MYMAIL
./script4.sh
As you can see I want to receive notifications during the process.
My problem is that users must write twice their email address.
I tried:
#PBS -M $MYMAIL
but it does not work.
I also tried to find a pbs variable containing the email stored during
#PBS -M mytestmail#testmail.com
but nothing ...
An idea ?

You are sending different emails by different methods. With the #PBS -M line you are telling pbs_server on the head node where is should send emails about the job and with "|mail " you are sending mail to the user from the node running the job.
It seems that Torque does not set an environment variable that contains the contents of -M so we can't pass that to mail.
I have two ideas for you. The first is trying to capture the Mail_Users line from qstat and parsing it.
qstat -f [job number] | grep Mail_Users
The second is to create a .forward file for each user since Torque will email the local user account by default eliminating the #PBS -M line. You will still need to pass an email or account name to mail but you may be able to get away with:
mail -s "$PBS_JOBNAME script1 done" `whoami`

Related

PBS Scheduling on Allocating One Node

I am trying to request two nodes in a cluster setting; however, when I print ${PBS_NODEFILE}, only one node is visible. I am running this batch script on the login node. Any suggestions to why I am only seeing one node ?
#PBS -S /bin/bash
#PBS -V
#PBS -W block=true
#PBS -l nodes=2:ppn=12
#PBS -l walltime=01:00:00
#PBS -N resnet50
#PBS -A MyProject
echo "The nodefile for this job is stored at ${PBS_NODEFILE}"
cat ${PBS_NODEFILE}

how to run bash script inside the PBS script in the head node, after running the program in compute nodes

current working script (script-A)
#!/bin/bash
#PBS -N test7
#PBS -q batch
#PBS -l nodes=1:ppn=6,walltime=00:30:00
#PBS -j oe
cd \$PBS_O_WORKDIR
mpirun -np 6 /home/sai/1QE/qe-6.5/bin/pw.x < si.scf.in > 92scf.out<br>
What I want?
I want to run a "bash script analysis.sh" in the "HEAD NODE" after running the above job inside the compute node.
e.g. script-B
#!/bin/bash
#PBS -N test7
#PBS -q batch
#PBS -l nodes=1:ppn=6,walltime=00:30:00
#PBS -j oe
cd \$PBS_O_WORKDIR
mpirun -np 6 /home/sai/1QE/qe-6.5/bin/pw.x < si.scf.in > 92scf.out
bash analysis.sh
Problem
The above script-B is also fine, but not in my case.
my problem is the analysis program is installed only in my head node, not in the compute node.
so it will work only in the head node.
so, is there ant way to run the analysis.sh script in the head node after PBS script in the compute node.

Virtual memory allocation in cluster - line command

I`m running a code on a computer cluster with 24 nodes, 12 processors each one and something about 64Gb memory each node. The commands I'm using to launch it are the following
#!/bin/sh
#PBS -N cclit
#PBS -l walltime=288:00:00
#PBS -l nodes=1:ppn=1
#PBS -j oe
#PBS -m n
#PBS -l mem=60000mb
Unfortunately I realized that my code need at least a virtual memory which is 120000mb. What I tried to do has been to modify the above commands as
#!/bin/sh
#PBS -N cclit
#PBS -l walltime=288:00:00
#PBS -l nodes=2:ppn=2
#PBS -j oe
#PBS -m n
#PBS -l mem=120000mb
But it doesn't seem to work... It stops again at the same point telling me that virtual memory is not sufficient.
My code is not parallelized, meaning that only 1 processor is needed. What happens when the memory of a node is totally used?? I guess Im doing something wrong with '#PBS -l mem=120000mb', or probably I need some other command... I tried to look for a solution on the web but I didn t find anything..
Can you help me?
Thanks Mirko.

running a job in unix

I have the following small script - myjob.qsub:
#!/bin/sh -login
#PBS -l walltime=00:15:00
#PBS -l nodes=1:ppn=1
#PBS -l mem=2gb
#PBS -N myrun05168
/myexecutable >mylog.log
I did make it executable by:
chmod u+x myexecutable
When I try to run by changing directory to the folder of executible and then sumbit the job:
qsub myjob.qsub
gives me error of no /myexecutable file or directory.
I tried to use "./":
#!/bin/sh -login
#PBS -l walltime=00:15:00
#PBS -l nodes=1:ppn=1
#PBS -l mem=2gb
#PBS -N myrun05168
./myexecutable >mylog.log
but doesnot help.
when I just tried to run my executable in command line as, it works:
./myexecutable
As I can not run this as this job need to be submitted as job in cluster computer.
Any suggestions ?
you need to give the full path to the script, I assume the script isn't actually in your root directory, maybe it should be /home/username/myexecutable
Your script runs with -login, is that needed?
You should change your script to use the relative pathname, like
myruns/p_runs/Fw2010/seed1/myexecutable >mylog.log

SunGridEngine, Condor, Torque as Resource Managers for PVM

Anyone have any idea which Resource manager is good for PVM? Or should I not have used PVM and instead relied on MPI (or any version of it, such as MPICH-2 [are there any other ones that are better?]). Main reason for using PVM was because the person before me who started this project assumed the use of PVM. However, now that this project is mine (he hasn't done any significant work that relies on PVM) this can be easily changed, preferably to something that is easy to install because installing and setting up PVM was a big hassle.
I'm leaning towards SunGridEngine seeing as how I have dedicated hardware, and after reading up on another post of which ones are better for dedicated hardware, SGE seems to be the winner. However I'm unsure of its performance using PVM. Wondering if anyone have had any experience with PVM and SGE?
If people use SGE, what do you use to communicate from computer to computer (or virtual machine to virtual machine)
Oh and I will be running Perl applications/lines if this matters.
Any suggestions or ideas?
Thanks in advance to all comments,
Tyug
I run PVM on Linux systems using Torque, SGE and LSF without any problems. Are you asking "Is it possible to use SGE, Torque, etc. to run PVM applications?"?
If so, check out my example Linux c-shell job scripts below. Note the scripts are nearly identical, except for the header of each script, which conforms to the appropriate format for each resource manager.
SGE job script:
#!/bin/csh
#$ -N LTR-001
#$ -o LTR-001.output
#$ -e LTR-001.error
#$ -pe comp 24
#$ -l h_rt=04:00:00
#$ -A cmit2
#$ -cwd
#$ -V
# Setup envirnoment
setenv LD_LIBRARY_PATH /lfs0/projects/cmit2/opt-intel/overture-noX/lib:${LD_LIBRARY_PATH}
setenv PVM_ARCH LINUX
setenv PVM_ROOT /lfs0/projects/cmit2/opt-intel/pvm3
setenv PVM_BIN ${PVM_ROOT}/bin
setenv PVM_RSH /usr/bin/ssh
setenv MY_HOSTS pvm_hostfile
rm -f ~/.pvmprofile
env | grep PVM_ > ~/.pvmprofile
# Create file containing _unique_ host names. Note that there are two possible sources of available hosts
sort -k 1,1 -u ${MACHINE_FILE} >! ${MY_HOSTS}
# Start PVM & add nodes
printf "%s\n%s\n" conf quit|${PVM_ROOT}/lib/pvm ${MY_HOSTS}
wait
sleep 2
#
# Run apps requiring PVM.
#
wait
# Exit PVM daemon
echo "reset" | $PVM_ROOT/lib/pvm
echo "halt" | $PVM_ROOT/lib/pvm
Torque job script:
#!/bin/csh
#PBS -N LTR-001
#PBS -o LTR-001.output
#PBS -e LTR-001.error
#PBS -l nodes=3:ppn=8
#PBS -l walltime=04:00:00
#PBS -q compute
#PBS -d .
# Setup envirnoment
setenv LD_LIBRARY_PATH /users/ps14/opt-intel/overture/lib:${LD_LIBRARY_PATH}
setenv PVM_ARCH LINUX64
setenv PVM_ROOT /users/ps14/opt-intel/pvm3
setenv PVM_BIN ${PVM_ROOT}/bin
setenv PVM_RSH ${PVM_ROOT}/ssh
setenv MY_HOSTS pvm_hostfile
rm -f ~/.pvmprofile
env | grep PVM_ > ~/.pvmprofile
# Create file containing _unique_ host names. Note that there are two possible sources of available hosts
sort -k 1,1 -u ${PBS_NODEFILE} >! ${MY_HOSTS}
# Start PVM & add nodes
printf "%s\n%s\n" conf quit|${PVM_ROOT}/lib/pvm ${MY_HOSTS}
wait
sleep 2
#
# Run apps requiring PVM.
#
wait
# Exit PVM daemon
echo "reset" | $PVM_ROOT/lib/pvm
echo "halt" | $PVM_ROOT/lib/pvm

Resources