How to know the PBS batch job submit time inside the script being excuted? - bash

I'm using the PBS qsub to run a script on a cluster that must output a report file named with the batch job submit time.
The batch job submit time is the time it joins the PBS batch job que.
I checked all PBS default variables but I didn't find anything related to the job submit time.
I would like to know how can I get this time without creating a new input variable.
Thanks.

I figured out this by myself.
Add the following function into your PBS batch job script to get the job submit time.
getsubmitdate(){
local datestring=`qstat -f $PBS_JOBID | grep -F qtime | awk '{for(i=3;i<8;i++) printf $i" "}'`;
local result=`date -d "$datestring" +%Y%m%d` ;
local outputvar=$1 ;
if [[ "$outputvar" ]] ; then
eval $outputvar="'$result'"
else
echo "$result"
fi
}
getsubmitdate SUBMITDATE
echo $SUBMITDATE

Related

How to grep the output of a command inside a shell script when scheduling using cron

I have a simple shell script where I need to check if my EMR job is running or not and I am just printing a log but it does not seem to work properly when scheduling the script using cron as it always prints the if block statement because the value of "status_live" var is always empty so if anyone can suggest what is wrong here otherwise on manually running the script it works properly.
#!/bin/sh
status_live=$(yarn application -list | grep -i "Streaming App")
if [ -z $status_live ]
then
echo "Running spark streaming job again at: "$(date) &
else
echo "Spark Streaming job is running, at: "$(date)
fi
Your script cannot run in cron because cron script has no environment context at all.
For example try to run your script as another use nobody that has no shell.
sudo -u nobody <script-full-path>
It will fail because it has no environment context.
The solution is to add your user environment context to your script. Just add source to your .bash_profile
sed -i "2a source $HOME/.bash_profile" <script-full-path>
Your script should look like:
#!/bin/sh
source /home/<your user name>/.bash_profile
status_live=$(yarn application -list | grep -i "Streaming App")
if [ -z $status_live ]
then
echo "Running spark streaming job again at: "$(date) &
else
echo "Spark Streaming job is running, at: "$(date)
fi
Now try to run it again with user nobody, if it works than cron will work as well.
sudo -u nobody <script-full-path>
Note that cron has no standard output. and you will need to redirect standard output from your script to a log file.
<script-full-path> >> <logfile-full-path>
# $? will have the last command status in bash shell scripting
# your complete command here below and status_live is 0 if it finds in grep (i.e. true in shell if condition.)
yarn application -list | grep -i "Streaming App"
status_live=$?
echo status_live: ${status_live}
if [ "$status_live" -eq 0 ]; then
echo "success
else
echo "fail"
fi

question on using bwait to wait for multiple bsub jobs to finish

I am new to using LSF (been using PBS/Torque all along).
I need to write code/logic to make sure all bsub jobs finish before other commands/jobs can be fired.
Here is what I have done: I have a master shell script which calls multiple other shell scripts via bsub commands. I capture the job ids from bsub in a log file and I need to ensure that all jobs get finished before the downstream shell script should execute its other commands.
Master shell script
#!/bin/bash
...Code not shown for brevity..
"Command 1 invoked with multiple bsubs" > log_cmd_1.txt
Need Code logic to use bwait before downstream Commands can be used
"Command 2 will be invoked with multiple bsubs" > log_cmd_2.txt
and so on
stdout captured from Command 1 within the Master Shell script is stored in log_cmd_1.txt which looks like this
Submitting Sample 101
Job <545> is submitted to .
Submitting Sample 102
Job <546> is submitted to .
Submitting Sample 103
Job <547> is submitted to .
Submitting Sample 104
Job <548> is submitted to .
I have used the codeblock shown below after Command 1 in the master shell script.
However, it does not seem to work for my situation. Looks like I would have gotten the whole thing wrong below.
while sleep 30m;
do
#the below gets the JobId from the log_cmd_1.txt and tries bwait
grep '^Job' <path_to>/log_cmd_1.txt | perl -pe 's!.*?<(\d+)>.*!$1!' | while read -r line; do res=$(bwait -w "done($line)");echo $res; done 1>
<path_to>/running.txt;
# the below sed command deletes lines that start with Space
sed '/^\s*$/d' running.txt > running2.txt;
# -s file check operator means "file is not zero size"
if [ -s $WORK_DIR/logs/running2.txt ]
then
echo "Jobs still running";
else
echo "Jobs complete";
break;
fi
done
The question: What's the correct way to do this using bwait within the master shell script.
Thanks in advance.
bwait will block until the condition is satisfied, so the loops are probably not neecessary. Note that since you're using done, if the job fails then bwait will exit and inform you that the condition can never be satisfied. Make sure to check that case.
What you have should work. At least the following test worked for me.
#!/bin/bash
# "Command 1 invoked with multiple bsubs" > log_cmd_1.txt
( bsub sleep 0; bsub sleep 0 ) > log_cmd_1.txt
# Need Code logic to use bwait before downstream Commands can be used
while sleep 1
do
#the below gets the JobId from the log_cmd_1.txt and tries bwait
grep '^Job' log_cmd_1.txt | perl -pe 's!.*?<(\d+)>.*!$1!' | while read -r line; do res=$(bwait -w "done($line)");echo "$res"; done 1> running.txt;
# the below sed command deletes lines that start with Space
sed '/^\s*$/d' running.txt > running2.txt;
# -s file check operator means "file is not zero size"
if [ -s running2.txt ]
then
echo "Jobs still running";
else
echo "Jobs complete";
break;
fi
done
Another way to do it. Which may is a little cleaner, is to use job arrays and job dependencies. Job arrays will combine several pieces of work that can be managed as a single job. So your
"Command 1 invoked with multiple bsubs" > log_cmd_1.txt
could be submitted as a single job array. You'll need a driver script that can launch the individual jobs. Here's an example driver script.
$ cat runbatch1.sh
#!/bin/bash
# $LSB_JOBINDEX goes from 1 to 10
if [ "$LSB_JOBINDEX" -eq 1 ]; then
# do the work for job batch 1, job 1
...
elif [ "$LSB_JOBINDEX" -eq 2 ]; then
# etc
...
fi
Then you can submit the job array like this.
bsub -J 'batch1[1-10]' sh runbatch1.sh
This command will run 10 job array elements. The driver script's environment will use the variable LSB_JOB_INDEX to let you know which element the driver is running. Since the array has a name, batch, it's easier to manage. You can submit a second job array that won't start until all elements of the first have completed successfully. The second array is submitted with this command.
bsub -w 'done(batch1)' -J 'batch2[1-10]' sh runbatch2.sh
I hope that this helps.

how to determine how much time autosys job takes to go to "success" state

Can anyone help me, I am trying to find the current status of an autosys job like below:
autorep -j jobname -d -L0 | grep "RUNNING" and if it is not running, I have to force start the same job.
After the same job is successfully restarted, I have to wait until the job status is SUCCESS to continue with my rest of the commands in my shell script. Please guide, many thanks for your help. Below is my code
status=`autorep -j jobname -d -L0 | grep "RUNNING"| awk '{print$1}'
echo $status
if["$status"=="RUNNING"];then
echo "The job is in RUNNING state"
else
echo "Force starting the job now !"
fsj jobname
fi
and, after the job status is success, I can continue with rest of my scripting. But the question is, how to know if I have to put sleep for 30 mins, 40 mins etc (appx job runs for 30 mins)
Is there any way to automatically trigger the next command,after the job is successful...instead of using sleep.
Thanks again.
while sleep 900; do autorep -j jobname -d -L0 |
awk '$1~/RUNNING/{r=1} END{if(!r) system("fsj jobname")}'; done &
This background loop will check the first column of your autorep output for "RUNNING" every 15 minutes, executing fsj jobname if it doesn't find a match. You might also be able to track the PID of the relevant process if you can identify it.

Hold remainder of shell script commands until PBS qsub array job completes

I am very new to shell scripting, and I am trying to write a shell pipeline that submits multiple qsub jobs, but has several commands to run in between these qsubs, which are contingent on the most recent job completing. I have been researching multiple ways to try and hold the shell script from proceeding after submission of a qsub job, but none have been successful.
The simplest chunk of code I can provide to illustrate the issue is as follows:
THREADS=`wc -l < list1.txt`
qsub -V -t 1-$THREADS firstjob.sh
echo "firstjob.sh completed"
There are obviously other lines of code after this that are actually contingent on firstjob.sh finishing, but I have omitted them here for clarity. I have tried the following methods of pausing/holding the script:
1) Only using wait, which is supposed to stop the script until all background programs are completed. This pushed right past the wait and printed the echo statement to the terminal while the array job was still running. My guess is this is occurring because once the qsub job is submitted, is exits and wait thinks it has completed?
qsub -V -t 1-$THREADS firstjob.sh
wait
echo "firstjob.sh completed"
2) Setting the job to a variable, echoing that variable to submit the job, and using the the entire job ID along with wait to pause. The echo command should wait until all elements of the array job have completed.The error message is shown following the code, within the code block.
job1=$(qsub -V -t 1-$THREADS firstjob.sh)
echo "$job1"
wait $job1
echo "firstjob.sh completed"
####ERROR RECEIVED####
-bash: wait: `4585057[].cluster-name.local': not a pid or valid job spec
3) Using the -sync y for qsub. This should prevent it from exiting the qsub until the job is complete, acting as an effective pause...I had hoped. Error in comment after the commands. For some reason it is not reading the -sync option correctly?
qsub -V -sync y -t 1-$THREADS firstjob.sh
echo "firstjob.sh completed"
####ERROR RECEIVED####
qsub: script file 'y' cannot be loaded - No such file or directory
4) Using a dummy shell script (the dummy just makes an empty file) so that I could use the -W depend=afterok: option of qsub to pause the script. This again pushes right past to the echo statement without any pause for submitting the dummy script. Both jobs get submitted, one right after the other, no pause.
job1=$(qsub -V -t 1-$THREADS demux.sh)
echo "$job1"
check=$(qsub -V -W depend=afterok:$job1 dummy.sh)
echo "$check"
echo "firstjob.sh completed"
Some further details regarding the script:
Each job submission is an array job.
The pipeline is being run in the terminal using a command resembling the following, so that I may provide it with 3 inputs: source Pipeline.sh -r list1.txt -d /workingDir/ -s list2.txt
I am certain that the firstjob.sh has not actually completed running because I see them in the queue when I use showq.
Perhaps there is an easy fix in most of these scenarios, but being new to all this, I am really struggling. I have to use this method in 8-10 places throughout the script, so it is really hindering progress. Would appreciate any assistance. Thanks.
POST EDIT 1
Here is the code contained in firstjob.sh...though doubtful that it will help. Everything in here functions as expected, always produces the correct results.
\#! /bin/bash
\#PBS -S /bin/bash
\#PBS -N demux
\#PBS -l walltime=72:00:00
\#PBS -j oe
\#PBS -l nodes=1:ppn=4
\#PBS -l mem=15gb
module load biotools
cd ${WORKDIR}/rawFQs/
INFILE=`head -$PBS_ARRAYID ${WORKDIR}${RAWFQ} | tail -1`
BASE=`basename "$INFILE" .fq.gz`
zcat $INFILE | fastx_barcode_splitter.pl --bcfile ${WORKDIR}/rawFQs/DemuxLists/${BASE}_sheet4splitter.txt --prefix ${WORKDIR}/fastqs/ --bol --suffix ".fq"
I just tried using -sync y, and that worked for me, so good idea there... Not sure what's different about your setup.
But a couple other things you could try involve your main script knowing the status of the qsub jobs you're running. One idea is that you could have your main script check the status of your job using qstat and wait until it finishes before proceeding.
Alternatively, you could have the first job write to a file as its last step (or, as you suggested, set up a dummy job that waits for the first job to finish). Then in your main script, you can test to see whether that file has been written before going on.

Using for loop with qsub for batch job submission

Could I please be advised how I could use a for loop to qsub files for batch job submission?
At the moment, this only works if I submit a single file for job submission using the command:
qsub -v /path/to/file.txt script.sh
However if I run a for loop through files using the following commands:
files=`pwd`/*pattern* (#This gives a list of files containing a certain common title)
for i in $files;
do
qsub -v $i script.sh
done
This always gets rejected with the error that the file.txt was not provided.
I have double checked if $i from the for loop is providing the right file.txt by doing:
for i in $files;
do
echo $i
done
and this works out fine. As such I am unsure why the for loop with qsub is not working. Could I please get advice on how I could alter the code to get it to work?
Thanks.
Using -v requires you to give the variable a name: qsub -v filepath=$i script.sh where you can then access the filepath inside script.sh with $filepath.

Resources