Hold remainder of shell script commands until PBS qsub array job completes - shell

I am very new to shell scripting, and I am trying to write a shell pipeline that submits multiple qsub jobs, but has several commands to run in between these qsubs, which are contingent on the most recent job completing. I have been researching multiple ways to try and hold the shell script from proceeding after submission of a qsub job, but none have been successful.
The simplest chunk of code I can provide to illustrate the issue is as follows:
THREADS=`wc -l < list1.txt`
qsub -V -t 1-$THREADS firstjob.sh
echo "firstjob.sh completed"
There are obviously other lines of code after this that are actually contingent on firstjob.sh finishing, but I have omitted them here for clarity. I have tried the following methods of pausing/holding the script:
1) Only using wait, which is supposed to stop the script until all background programs are completed. This pushed right past the wait and printed the echo statement to the terminal while the array job was still running. My guess is this is occurring because once the qsub job is submitted, is exits and wait thinks it has completed?
qsub -V -t 1-$THREADS firstjob.sh
wait
echo "firstjob.sh completed"
2) Setting the job to a variable, echoing that variable to submit the job, and using the the entire job ID along with wait to pause. The echo command should wait until all elements of the array job have completed.The error message is shown following the code, within the code block.
job1=$(qsub -V -t 1-$THREADS firstjob.sh)
echo "$job1"
wait $job1
echo "firstjob.sh completed"
####ERROR RECEIVED####
-bash: wait: `4585057[].cluster-name.local': not a pid or valid job spec
3) Using the -sync y for qsub. This should prevent it from exiting the qsub until the job is complete, acting as an effective pause...I had hoped. Error in comment after the commands. For some reason it is not reading the -sync option correctly?
qsub -V -sync y -t 1-$THREADS firstjob.sh
echo "firstjob.sh completed"
####ERROR RECEIVED####
qsub: script file 'y' cannot be loaded - No such file or directory
4) Using a dummy shell script (the dummy just makes an empty file) so that I could use the -W depend=afterok: option of qsub to pause the script. This again pushes right past to the echo statement without any pause for submitting the dummy script. Both jobs get submitted, one right after the other, no pause.
job1=$(qsub -V -t 1-$THREADS demux.sh)
echo "$job1"
check=$(qsub -V -W depend=afterok:$job1 dummy.sh)
echo "$check"
echo "firstjob.sh completed"
Some further details regarding the script:
Each job submission is an array job.
The pipeline is being run in the terminal using a command resembling the following, so that I may provide it with 3 inputs: source Pipeline.sh -r list1.txt -d /workingDir/ -s list2.txt
I am certain that the firstjob.sh has not actually completed running because I see them in the queue when I use showq.
Perhaps there is an easy fix in most of these scenarios, but being new to all this, I am really struggling. I have to use this method in 8-10 places throughout the script, so it is really hindering progress. Would appreciate any assistance. Thanks.
POST EDIT 1
Here is the code contained in firstjob.sh...though doubtful that it will help. Everything in here functions as expected, always produces the correct results.
\#! /bin/bash
\#PBS -S /bin/bash
\#PBS -N demux
\#PBS -l walltime=72:00:00
\#PBS -j oe
\#PBS -l nodes=1:ppn=4
\#PBS -l mem=15gb
module load biotools
cd ${WORKDIR}/rawFQs/
INFILE=`head -$PBS_ARRAYID ${WORKDIR}${RAWFQ} | tail -1`
BASE=`basename "$INFILE" .fq.gz`
zcat $INFILE | fastx_barcode_splitter.pl --bcfile ${WORKDIR}/rawFQs/DemuxLists/${BASE}_sheet4splitter.txt --prefix ${WORKDIR}/fastqs/ --bol --suffix ".fq"

I just tried using -sync y, and that worked for me, so good idea there... Not sure what's different about your setup.
But a couple other things you could try involve your main script knowing the status of the qsub jobs you're running. One idea is that you could have your main script check the status of your job using qstat and wait until it finishes before proceeding.
Alternatively, you could have the first job write to a file as its last step (or, as you suggested, set up a dummy job that waits for the first job to finish). Then in your main script, you can test to see whether that file has been written before going on.

Related

Bash Jobs : how can I output job process information along with errors?

I'm using a bash script to import some data into a MySQL database. I'm parallelising the loads using jobs in...
for i in *.sql; do
{
echo "importing $i"
mysql --defaults-extra-file=my.cnf < $i
echo "$i load complete
} &
done
If one of the jobs errors, I see the error in stdout, but nothing to identify which of the jobs caused it. Is there a way to add the job process information to the error output so I which which one it was?
You can try using GNU Parallel:
parallel --progress --tag 'mysql --defaults-extra-file=my.cnf < {}' ::: *.sql > import.log
This will run the mysql --defaults-extra-file=my.cnf as many times as there are .sql files in the current directory, in parallel. You can decide on the maximum number of parallel processes with the --jobs option.
The --tag option prefixes each output line with the file being processed. The output is redirected to a file for further reference, and the --bar options shows a progress bar for a visual feedback.

How to run an Inotify shell script as an asynchronous process

I have an inotify shell script which monitors a directory, and executes certain commands if a new file comes in. I need to make this inotify script into a parallelized process, so the execution of the script doesn't wait for the process to complete whenever multiple files comes into the directory.
I have tried using nohup, & and xargs to achieve this task. But the problem was, xargs runs the same script as a number of processes, whenever a new file comes in, all the running n processes try to process the script. But essentially I only want one of the processes to process the new file whichever is idle. Something like worker pool, whichever worker is free or idle tries to execute the task.
This is my shell script.
#!/bin/bash
# script.sh
inotifywait --monitor -r -e close_write --format '%w%f' ./ | while read FILE
do
echo "started script";
sleep $(( $RANDOM % 10 ))s;
#some more process which takes time when a new file comes in
done
I did try to execute the script like this with xargs =>
xargs -n1 -P3 bash sample.sh
So whenever a new file comes in, it is getting processed thrice because of P3, but ideally i want one of the processes to pick this task which ever is idle.
Please shed some light on how to approach this problem?
There is no reason to have a pool of idle processes. Just run one per new file when you see new files appear.
#!/bin/bash
inotifywait --monitor -r -e close_write --format '%w%f' ./ |
while read -r file
do
echo "started script";
( sleep $(( $RANDOM % 10 ))s
#some more process which takes time when a new "$file" comes in
) &
done
Notice the addition of & and the parentheses to group the sleep and the subsequent processing into a single subshell which we can then background.
Also, notice how we always prefer read -r and Correct Bash and shell script variable capitalization
Maybe this will work:
https://www.gnu.org/software/parallel/man.html#EXAMPLE:-GNU-Parallel-as-dir-processor
If you have a dir in which users drop files that needs to be processed you can do this on GNU/Linux (If you know what inotifywait is called on other platforms file a bug report):
inotifywait -qmre MOVED_TO -e CLOSE_WRITE --format %w%f my_dir |
parallel -u echo
This will run the command echo on each file put into my_dir or subdirs of my_dir.
To run at most 5 processes use -j5.

question on using bwait to wait for multiple bsub jobs to finish

I am new to using LSF (been using PBS/Torque all along).
I need to write code/logic to make sure all bsub jobs finish before other commands/jobs can be fired.
Here is what I have done: I have a master shell script which calls multiple other shell scripts via bsub commands. I capture the job ids from bsub in a log file and I need to ensure that all jobs get finished before the downstream shell script should execute its other commands.
Master shell script
#!/bin/bash
...Code not shown for brevity..
"Command 1 invoked with multiple bsubs" > log_cmd_1.txt
Need Code logic to use bwait before downstream Commands can be used
"Command 2 will be invoked with multiple bsubs" > log_cmd_2.txt
and so on
stdout captured from Command 1 within the Master Shell script is stored in log_cmd_1.txt which looks like this
Submitting Sample 101
Job <545> is submitted to .
Submitting Sample 102
Job <546> is submitted to .
Submitting Sample 103
Job <547> is submitted to .
Submitting Sample 104
Job <548> is submitted to .
I have used the codeblock shown below after Command 1 in the master shell script.
However, it does not seem to work for my situation. Looks like I would have gotten the whole thing wrong below.
while sleep 30m;
do
#the below gets the JobId from the log_cmd_1.txt and tries bwait
grep '^Job' <path_to>/log_cmd_1.txt | perl -pe 's!.*?<(\d+)>.*!$1!' | while read -r line; do res=$(bwait -w "done($line)");echo $res; done 1>
<path_to>/running.txt;
# the below sed command deletes lines that start with Space
sed '/^\s*$/d' running.txt > running2.txt;
# -s file check operator means "file is not zero size"
if [ -s $WORK_DIR/logs/running2.txt ]
then
echo "Jobs still running";
else
echo "Jobs complete";
break;
fi
done
The question: What's the correct way to do this using bwait within the master shell script.
Thanks in advance.
bwait will block until the condition is satisfied, so the loops are probably not neecessary. Note that since you're using done, if the job fails then bwait will exit and inform you that the condition can never be satisfied. Make sure to check that case.
What you have should work. At least the following test worked for me.
#!/bin/bash
# "Command 1 invoked with multiple bsubs" > log_cmd_1.txt
( bsub sleep 0; bsub sleep 0 ) > log_cmd_1.txt
# Need Code logic to use bwait before downstream Commands can be used
while sleep 1
do
#the below gets the JobId from the log_cmd_1.txt and tries bwait
grep '^Job' log_cmd_1.txt | perl -pe 's!.*?<(\d+)>.*!$1!' | while read -r line; do res=$(bwait -w "done($line)");echo "$res"; done 1> running.txt;
# the below sed command deletes lines that start with Space
sed '/^\s*$/d' running.txt > running2.txt;
# -s file check operator means "file is not zero size"
if [ -s running2.txt ]
then
echo "Jobs still running";
else
echo "Jobs complete";
break;
fi
done
Another way to do it. Which may is a little cleaner, is to use job arrays and job dependencies. Job arrays will combine several pieces of work that can be managed as a single job. So your
"Command 1 invoked with multiple bsubs" > log_cmd_1.txt
could be submitted as a single job array. You'll need a driver script that can launch the individual jobs. Here's an example driver script.
$ cat runbatch1.sh
#!/bin/bash
# $LSB_JOBINDEX goes from 1 to 10
if [ "$LSB_JOBINDEX" -eq 1 ]; then
# do the work for job batch 1, job 1
...
elif [ "$LSB_JOBINDEX" -eq 2 ]; then
# etc
...
fi
Then you can submit the job array like this.
bsub -J 'batch1[1-10]' sh runbatch1.sh
This command will run 10 job array elements. The driver script's environment will use the variable LSB_JOB_INDEX to let you know which element the driver is running. Since the array has a name, batch, it's easier to manage. You can submit a second job array that won't start until all elements of the first have completed successfully. The second array is submitted with this command.
bsub -w 'done(batch1)' -J 'batch2[1-10]' sh runbatch2.sh
I hope that this helps.

Cron job won't start again after I stopped it?

I wrote a script to run constantly on startup. If for whatever reason the script were to fail, I wrote a second script to check if it has failed, and if so, run the first script again. I then set this second script as a cronjob to run every minute so that it is constantly checking if the first script is alive.
So to test this, I reboot my system. I can see in htop that the first script is running from start up as expected. Good. I kill the process to test the second script. Sure enough, the second script starts the first script again. Still good. I then kill this process, but the second script won't run again now. It still updates a txt file when I manually start the first script, but the second script just doesn't start the first script like it's supposed to. Is it because I killed the cronjob? Restarting the cron service doesn't fix anything though, so I don't know why my second script isn't running again at all.
First script:
#!/bin/bash
stamp=$(date +%Y%m%d-%H%M)
timeout 10d tcpdump -i eth0 -s 96 -z gzip -C 10 -w /home/user/Documents/${stamp}
Second script:
#!/bin/bash
echo "not running" > /home/working.txt
if (( $(ps -ef | grep -v grep | grep tcpdump.sh | wc -l) > 0 ))
then
echo "tcpdump is running!!!" > /home/working.txt
else
/usr/local/bin/tcpdump.sh start
fi
Any help?
You would probably be better off running a simple for loop as the main script, and that kicks off the tcpdump script in the background, so something like:
#!/bin/bash
while true; do
if ps -ef | grep -v grep | grep -q tcpdump; then
: tcpdump running OK
else
# tcpdump not running - start it off
nohup /usr/local/bin/firstscript.sh start &
fi
sleep 30
done
This checks that "tcpdump.sh" is in the output of the "ps -ef" command - if it is, then do nothing (note that you must have an actual command between the "then" and "else" - the ":" command, which just takes it s arguments and ignores them, is sufficient). If it isn't running, start the first script in the background. Then sleep 30 seconds and check again. (Yes, I could have inverted the test so that I didn't need an empty "then" arm, but it would have made the code less obvious)
You put this script as the one which starts at boot time.
Edit: Do you really want to check for "tcpdump.sh"? Is that what the first script is actually called? Assuming that you actually want to check for the tcpdump program, you could use:
if pgrep tcpdump; then

Terminal Application to Keep Web Server Process Alive

Is there an app that can, given a command and options, execute for the lifetime of the process and ping a given URL indefinitely on a specific interval?
If not, could this be done on the terminal as a bash script? I'm almost positive it's doable through terminal, but am not fluent enough to whip it up within a few minutes.
Found this post that has a portion of the solution, minus the ping bits. ping runs on linux, indefinitely; until it's actively killed. How would I kill it from bash after say, two pings?
General Script
As others have suggested, use this in pseudo code:
execute command and save PID
while PID is active, ping and sleep
exit
This results in following script:
#!/bin/bash
# execute command, use '&' at the end to run in background
<command here> &
# store pid
pid=$!
while ps | awk '{ print $1 }' | grep $pid; do
ping <address here>
sleep <timeout here in seconds>
done
Note that the stuff inside <> should be replaces with actual stuff. Be it a command or an ip address.
Break from Loop
To answer your second question, that depends in the loop. In the loop above, simply track the loop count using a variable. To do that, add a ((count++)) inside the loop. And do this: [[ $count -eq 2 ]] && break. Now the loop will break when we're pinging for a second time.
Something like this:
...
while ...; do
...
((count++))
[[ $count -eq 2 ]] && break
done
ping twice
To ping only a few times, use the -c option:
ping -c <count here> <address here>
Example:
ping -c 2 www.google.com
Use man ping for more information.
Better practice
As hek2mgl noted in a comment below, the current solution may not suffice to solve the problem. While answering the question, the core problem will still persist. To aid to that problem, a cron job is suggested in which a simple wget or curl http request is sent periodically. This results in a fairly easy script containing but one line:
#!/bin/bash
curl <address here> > /dev/null 2>&1
This script can be added as a cron job. Leave a comment if you desire more information how to set such a scheduled job. Special thanks to hek2mgl for analyzing the problem and suggesting a sound solution.
Say you want to start a download with wget and while it is running, ping the url:
wget http://example.com/large_file.tgz & #put in background
pid=$!
while kill -s 0 $pid #test if process is running
do
ping -c 1 127.0.0.1 #ping your adress once
sleep 5 #and sleep for 5 seconds
done
A nice little generic utility for this is Daemonize. Its relevant options:
Usage: daemonize [OPTIONS] path [arg] ...
-c <dir> # Set daemon's working directory to <dir>.
-E var=value # Pass environment setting to daemon. May appear multiple times.
-p <pidfile> # Save PID to <pidfile>.
-u <user> # Run daemon as user <user>. Requires invocation as root.
-l <lockfile> # Single-instance checking using lockfile <lockfile>.
Here's an example of starting/killing in use: flickd
To get more sophisticated, you could turn your ping script into a systemd service, now standard on many recent Linuxes.

Resources