Custom job script submission to PBS via Dask? - cluster-computing

I have a PBS job script with an executable that writes results to out file.
### some lines
PBS_O_EXEDIR="path/to/software"
EXECUTABLE="executablefile"
OUTFILE="out"
### Copy application directory on compute node
[ -d $PBS_O_EXEDIR ] || mkdir -p $PBS_O_EXEDIR
[ -w $PBS_O_EXEDIR ] && \
rsync -Cavz --rsh=$SSH $HOST:$PBS_O_EXEDIR `dirname $PBS_O_EXEDIR`
[ -d $PBS_O_WORKDIR ] || mkdir -p $PBS_O_WORKDIR
rsync -Cavz --rsh=$SSH $HOST:$PBS_O_WORKDIR `dirname $PBS_O_WORKDIR`
# Change into the working directory
cd $PBS_O_WORKDIR
# Save the jobid in the outfile
echo "PBS-JOB-ID was $PBS_JOBID" > $OUTFILE
# Run the executable
$PBS_O_EXEDIR/$EXECUTABLE >> $OUTFILE
In my project, I have to use Dask for this job submission and to monitor them. Therefore, I have configured jobqueue.yaml file like this.
jobqueue:
pbs:
name: htc_calc
# Dask worker options
cores: 4 # Total number of cores per job
memory: 50GB # Total amount of memory per job
# PBS resource manager options
shebang: "#!/usr/bin/env bash"
walltime: '00:30:00'
exe_dir: "/home/r/rb11/softwares/FPLO/bin"
excutable: "fplo18.00-57-x86_64"
outfile: "out"
job-extra: "exe_dir/executable >> outfile"
However, I got this error while submitting jobs via Dask.
qsub: directive error: e
tornado.application - ERROR - Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f3d8c4a56a8>, <Task finished coro=<SpecCluster._correct_state_internal() done, defined at /home/r/rb11/anaconda3/envs/htc/lib/python3.5/site-packages/distributed/deploy/spec.py:284> exception=RuntimeError('Command exited with non-zero exit code.\nExit code: 1\nCommand:\nqsub /tmp/tmpwyvkfcmi.sh\nstdout:\n\nstderr:\nqsub: directive error: e \n\n',)>)
Traceback (most recent call last):
File "/home/r/rb11/anaconda3/envs/htc/lib/python3.5/site-packages/tornado/ioloop.py", line 758, in _run_callback
ret = callback()
File "/home/r/rb11/anaconda3/envs/htc/lib/python3.5/site-packages/tornado/stack_context.py", line 300, in null_wrapper
return fn(*args, **kwargs)
File "/home/r/rb11/anaconda3/envs/htc/lib/python3.5/site-packages/tornado/ioloop.py", line 779, in _discard_future_result
future.result()
File "/home/r/rb11/anaconda3/envs/htc/lib/python3.5/asyncio/futures.py", line 294, in result
raise self._exception
File "/home/r/rb11/anaconda3/envs/htc/lib/python3.5/asyncio/tasks.py", line 240, in _step
result = coro.send(None)
File "/home/r/rb11/anaconda3/envs/htc/lib/python3.5/site-packages/distributed/deploy/spec.py", line 317, in _correct_state_internal
await w # for tornado gen.coroutine support
File "/home/r/rb11/anaconda3/envs/htc/lib/python3.5/site-packages/distributed/deploy/spec.py", line 41, in _
await self.start()
File "/home/r/rb11/anaconda3/envs/htc/lib/python3.5/site-packages/dask_jobqueue/core.py", line 285, in start
out = await self._submit_job(fn)
File "/home/r/rb11/anaconda3/envs/htc/lib/python3.5/site-packages/dask_jobqueue/core.py", line 268, in _submit_job
return self._call(shlex.split(self.submit_command) + [script_filename])
File "/home/r/rb11/anaconda3/envs/htc/lib/python3.5/site-packages/dask_jobqueue/core.py", line 368, in _call
"stderr:\n{}\n".format(proc.returncode, cmd_str, out, err)
RuntimeError: Command exited with non-zero exit code.
Exit code: 1
Command:
qsub /tmp/tmpwyvkfcmi.sh
stdout:
stderr:
qsub: directive error: e
How do I specify custom bash script in Dask?

Dask is used for distributing Python applications. In the case of Dask Jobqueue it works by submitting a scheduler and workers to the batch system, which connect together to form their own cluster. You can then submit Python work to the Dask scheduler.
It looks like from your example you are trying to use the cluster setup configuration to run your own bash application instead of Dask.
In order to do this with Dask you should return the jobqueue config to the defaults and instead write a Python function which calls your bash script.
from dask_jobqueue import PBSCluster
cluster = PBSCluster()
cluster.scale(jobs=10) # Deploy ten single-node jobs
from dask.distributed import Client
client = Client(cluster) # Connect this local process to remote workers
client.submit(os.system, "/path/to/your/script") # Run script on all workers
However it just seems like Dask may not be a good fit for what you are trying to do. You would probably be better off just submitting your job to PBS normally.

Related

baseDir issue with nextflow

This might be a very basic question for you guys, however, I am have just started with nextflow and I struggling with the simplest example.
I first explain what I have done and the problem.
Aim: I aim to make a workflow for my bioinformatics analyses as the one here (https://www.nextflow.io/example4.html)
Background: I have installed all the packages that were needed and they all work from the console without any error.
My run: I have used the same script as in example only by replacing the directory names. Here is how I have arranged the directories
location of script
~/raman/nflow/script.nf
location of Fastq files
~/raman/nflow/Data/T4_1.fq.gz
~/raman/nflow/Data/T4_2.fq.gz
Location of transcriptomic file
~/raman/nflow/Genome/trans.fa
The script
#!/usr/bin/env nextflow
/*
* The following pipeline parameters specify the refence genomes
* and read pairs and can be provided as command line options
*/
params.reads = "$baseDir/Data/T4_{1,2}.fq.gz"
params.transcriptome = "$baseDir/HumanGenome/SalmonIndex/gencode.v42.transcripts.fa"
params.outdir = "results"
workflow {
read_pairs_ch = channel.fromFilePairs( params.reads, checkIfExists: true )
INDEX(params.transcriptome)
FASTQC(read_pairs_ch)
QUANT(INDEX.out, read_pairs_ch)
}
process INDEX {
tag "$transcriptome.simpleName"
input:
path transcriptome
output:
path 'index'
script:
"""
salmon index --threads $task.cpus -t $transcriptome -i index
"""
}
process FASTQC {
tag "FASTQC on $sample_id"
publishDir params.outdir
input:
tuple val(sample_id), path(reads)
output:
path "fastqc_${sample_id}_logs"
script:
"""
fastqc "$sample_id" "$reads"
"""
}
process QUANT {
tag "$pair_id"
publishDir params.outdir
input:
path index
tuple val(pair_id), path(reads)
output:
path pair_id
script:
"""
salmon quant --threads $task.cpus --libType=U -i $index -1 ${reads[0]} -2 ${reads[1]} -o $pair_id
"""
}
Output:
(base) ntr#ser:~/raman/nflow$ nextflow script.nf
N E X T F L O W ~ version 22.10.1
Launching `script.nf` [modest_meninsky] DSL2 - revision: 032a643b56
executor > local (2)
executor > local (2)
[- ] process > INDEX (gencode) -
[28/02cde5] process > FASTQC (FASTQC on T4) [100%] 1 of 1, failed: 1 ✘
[- ] process > QUANT -
Error executing process > 'FASTQC (FASTQC on T4)'
Caused by:
Missing output file(s) `fastqc_T4_logs` expected by process `FASTQC (FASTQC on T4)`
Command executed:
fastqc "T4" "T4_1.fq.gz T4_2.fq.gz"
Command exit status:
0
Command output:
(empty)
Command error:
Skipping 'T4' which didn't exist, or couldn't be read
Skipping 'T4_1.fq.gz T4_2.fq.gz' which didn't exist, or couldn't be read
Work dir:
/home/ruby/raman/nflow/work/28/02cde5184f4accf9a05bc2ded29c50
Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
I believe I have an issue with my baseDir understanding. I am assuming that the baseDir is the one where I have my file script.nf I am not sure what is going wrong and how can I fix it.
Could anyone please help or guide.
Thank you
Caused by:
Missing output file(s) `fastqc_T4_logs` expected by process `FASTQC (FASTQC on T4)`
Nextflow complains when it can't find the declared output files. This can occur even if the command completes successfully, i.e. with exit status 0. The problem here is that fastqc simply skips files that don't exist or can't be read (e.g. permissions problems), but it does produce these warnings:
Skipping 'T4' which didn't exist, or couldn't be read
Skipping 'T4_1.fq.gz T4_2.fq.gz' which didn't exist, or couldn't be read
The solution is to just make sure all files exist. Note that the fromFilePairs factory method produces a list of files in the second element. Therefore quoting a space-separated pair of filenames is also problematic. All you need is:
script:
"""
fastqc ${reads}
"""

Nextflow: Missing output file(s) expected by process

I'm currently making a start on using Nextflow to develop a bioinformatics pipeline. Below, I've created a params.files variable which contains my FASTQ files, and then input this into fasta_files channel.
The process trimming and its scripts takes this channel as the input, and then ideally, I would output all the $sample".trimmed.fq.gz into the output channel, trimmed_channel. However, when I run this script, I get the following error:
Missing output file(s) `trimmed_files` expected by process `trimming` (1)
The nextflow script I'm trying to run is:
#! /usr/bin/env nextflow
params.files = files("$baseDir/FASTQ/*.fastq.gz")
println "fastq files for trimming:$params.files"
fasta_files = Channel.fromPath(params.files)
println "files in the fasta channel: $fasta_files"
process trimming {
input:
file fasta_file from fasta_files
output:
path trimmed_files into trimmed_channel
// the shell script to be run:
"""
#!/usr/bin/env bash
mkdir trimming_report
cd /home/usr/Nextflow
#Finding and renaming my FASTQ files
for file in FASTQ/*.fastq.gz; do
[ -f "\$file" ] || continue
name=\$(echo "\$file" | awk -F'[/]' '{ print \$2 }') #renaming fastq files.
sample=\$(echo "\$name" | awk -F'[.]' '{ print \$1 }') #renaming fastq files.
echo "Found" "\$name" "from:" "\$sample"
if [ ! -e FASTQ/"\$sample"_trimmed.fq.gz ]; then
trim_galore -j 8 "\$file" -o FASTQ #trim the files
mv "\$file"_trimming_report.txt trimming_report #moves to the directory trimming report
else
echo ""\$sample".trimmed.fq.gz exists skipping trim galore"
fi
done
trimmed_files="FASTQ/*_trimmed.fq.gz"
echo \$trimmed_files
"""
}
The script in the process works fine. However, I'm wondering if I'm misunderstanding or missing something obvious. If I've forgot to include something, please let me know and any help is appreciated!
Nextflow does not export the variable trimmed_files to its own scope unless you tell it to do so using the env output qualifier, however doing it that way would not be very idiomatic.
Since you know the pattern of your output files ("FASTQ/*_trimmed.fq.gz"), simply pass that pattern as output:
path "FASTQ/*_trimmed.fq.gz" into trimmed_channel
Some things you do, but probably want to avoid:
Changing directory inside your NF process, don't do this, it entirely breaks the whole concept of nextflow's /work folder setup.
Write a bash loop inside a NF process, if you set up your channels correctly there should only be 1 task per spawned process.
Pallie has already provided some sound advice and, of course, the right answer, which is: environment variables must be declared using the env qualifier.
However, given your script definition, I think there might be some misunderstanding about how best to skip the execution of previously generated results. The cache directive is enabled by default and when the pipeline is launched with the -resume option, additional attempts to execute a process using the same set of inputs, will cause the process execution to be skipped and will produce the stored data as the actual results.
This example uses the Nextflow DSL 2 for my convenience, but is not strictly required:
nextflow.enable.dsl=2
params.fastq_files = "${baseDir}/FASTQ/*.fastq.gz"
params.publish_dir = "./results"
process trim_galore {
tag { "${sample}:${fastq_file}" }
publishDir "${params.publish_dir}/TrimGalore", saveAs: { fn ->
fn.endsWith('.txt') ? "trimming_reports/${fn}" : fn
}
cpus 8
input:
tuple val(sample), path(fastq_file)
output:
tuple val(sample), path('*_trimmed.fq.gz'), emit: trimmed_fastq_files
path "${fastq_file}_trimming_report.txt", emit: trimming_report
"""
trim_galore \\
-j ${task.cpus} \\
"${fastq_file}"
"""
}
workflow {
Channel.fromPath( params.fastq_files )
| map { tuple( it.getSimpleName(), it ) }
| set { sample_fastq_files }
results = trim_galore( sample_fastq_files )
results.trimmed_fastq_files.view()
}
Run using:
nextflow run script.nf \
-ansi-log false \
--fastq_files '/home/usr/Nextflow/FASTQ/*.fastq.gz'

Stop a bash script when awm command returns with failure

I have the following command
ads2 cls create
This command might return two outputs, a reasonable one that looks like:
kernel with pid 7148 (port 9011) killed
kernel with pid 9360 (port 9011) killed
probing service daemon # http://fdt-c-vm-0093.fdtech.intern:9010
starting kernel FDT-C-VM-0093 # http://fdt-c-yy-0093.ssbt.intern:9011 name=FDT-C-VM-0093 max_consec_timeouts=10 clustermode=Standard hostname=FDT-C-VM-0093 framerate=20000 schedmode=Standard rtaddr=fdt-c-vm-0093.fdtech.ssbt tickrole=Local tickmaster=local max_total_timeouts=1000
kernel FDT-C-VM-0093 running
probing service daemon # http://172.16.xx.xx:9010
starting kernel FDT-C-AGX-0004 # http://172.16.xx.xx:9011 name=FDT-C-AGX-0004 max_consec_timeouts=10 clustermode=Standard hostname=FDT-C-AGX-0004 framerate=20000 schedmode=Standard rtaddr=172.16.xx.xx tickrole=Local tickmaster=local max_total_timeouts=1000
kernel Fxx-x-xxx-xxx4 running
>>> start cluster establish ...
>>> cluster established ...
nodes {
node {
name = "FDT-C-VM-xxxx";
address = "http://fxx-x-xx-0093.xxx.intern:xxxx/";
state = "3";
}
node {
name = "xxx-x-xxx-xxx";
address = "http://1xx.16.xx.xx:9011/";
state = "3";
}
}
and an unreasonable one that would be:
kernel with pid 8588 (port 9011) killed
failed to probe service daemon # http://xxx-c-agx-0002.xxxx.intern:90xx
In both ways, I'm passing this output to awk in order to check the state of the nodes in case a reasonable output is returned, otherwise it should exits the whole script (line 28).
ads2 cls create | awk -F [\"] ' BEGIN{code=1} # Set the field delimiter to a double quote
/^>>> cluster established .../ {
strt=1 # If the line starts with ">>> cluster established ...", set a variable strt to 1
}
strt!=1 {
next # If strt is not equal to 1, skip to the next line
}
$1 ~ "name" {
cnt++; # If the first field contains name, increment a cnt variable
nam[cnt]=$2 # Use the cnt variable as the index of an array called nam with the second field the value
}
$1 ~ "state" {
stat[cnt]=$2; # When the first field contains "state", set up another array called stat
print "Node "nam[cnt]" has state "$2 # Print the node name as well as the state
}
END {
if (stat[1]=="3" && stat[2]=="3") {
print "\033[32m" "Success" "\033[37m" # At the end of processing, the array is used to determine whether there is a success of failure.
}
28 else {
29 print "\033[31m" "Failed. Check Nodes in devices.dev file" "\033[37m"
30 exit code
}
}'
some other commands...
Note that this code block is a part of a bash script.
All I'm trying to do is just to stop the whole script (rest following commands) from continuing to execute when it goes inside line 29 in which the exit 1 code should actually do the job. However its not working. In other words. It prints actually the statement Failed. Check Nodes in devices.dev file. However, it continues executing the next commands while i expect the script to stop as the exit command in line 30 should have also been executed.
I suspect your subject Stop a bash script from inside an awk command is what's getting you downvotes as trying to control what the shell that called awk does from inside the awk script is something you can't and shouldn't try to do as that would be a bad case of Inversion Of Control like calling a function in C to do something and that function deciding to exit the whole program instead of just returning a failure status so the calling code can decide what to do upon that failure (e.g. perform recovery actions and then call that function again).
You seem to be confusing exiting your awk script with exiting your shell script. If you want to exit your shell script when the awk script exits with a failure status then you need to write the shell code to tell the shell to do so, e.g.:
whatever | awk 'script' || exit 1
or to get fancy about it:
whatever | awk 'script' || { ret="$?"; printf 'awk exited with status %d\n' "$ret" >&2; exit "$ret"; }
For example:
$ cat tst.sh
#!/usr/bin/env bash
date | awk '{exit 1}' || { ret="$?"; printf 'awk exited with status %d\n' "$ret" >&2; exit 1; }
echo "we should not get here"
$ ./tst.sh
awk exited with status 1

Difference in behavior between shell and script

I have a set of commands that I am attempting to run in a script. To be exact, the lines are
rm tmp_pipe
mkfifo tmp_pipe
python listen_pipe.py &
while [ true ]; do nc -l -w30 7036 >>tmp_pipe; done &
listen_pipe.py is simply
if __name__ == "__main__":
f = open("tmp_pipe")
vals = " "
while "END" not in vals:
vals = f.readline()
if len(vals) > 0:
print(vals)
else:
f = open("tmp_pipe")
If I run the commands in the order shown I get my desired output, which is a connection to an ESP device that streams motion data. The connection resets after 30 seconds if the ESP device leaves the network range or if the device is turned off. The python script continues to read from the pipe and does not terminate when the tcp connection is reset. However, if I run this code inside a script file nc fails to connect and the device remains in an unconnected state indefinitely. The script is just
#!/bin/bash
rm tmp_pipe
mkfifo tmp_pipe
python listen_pipe.py &
while [ true ]; do nc -l -w30 7036 >>tmp_pipe; done &
This is being run on Ubuntu 16.04. Any suggestions are greatly welcomed, I have been fighting with this code all day. Thanks,
Ian

which loop in bash script

I am quite new in bash, but I need to create a simple script which will do below steps:
Wait 1 minute
A) bash script will use CM to generate result file
B) check row 8 in result file (to know if Administrator is running any jobs or not)
if NO jobs:
C) bash script will use CM to start cube refresh
D) wait 1 minute
D1) Remove result file
E) generate result file
E1) Read row 8
no jobs:
F) remove result file G) EXIT
yes:
I) Go to D)
YES:
E) Wait 1 minute
F) Remove result file
Go to A)
As bash doesn't have goto (or should not be use), I tried few loops, but I not sure which I should choose.
I know how to:
- start cube(step C)
- generate result file (step A & E):
- check line 8:
sed '8!d' /abc_uat/cmlogs/adm_jobs_u1.log
condition for loops will be probably similar to this: !='Owner = Administrator'
but how to avoid goto ?
I tried with while do loop, but I am not sure what should I add in case of false condition, I added else, but not sure of it:
sleep 60
Generate result file with admin jobs (which admin runs inside of 3rd party tool)
while [ sed '8!d' admin_jobs_result_file.log !="Owner = Administrator" ];
do
--NO Admin jobs
START CUBE REFRESH (it will start admin job)
sleep 60
REMOVE RESULT FILE (OLD)
GENERATE RESULT FILE
while [ sed '8!d' admin_jobs_result_file.log = "Owner = Administrator" ];
--Admin is still running cube refresh
do
sleep 60
REMOVE RESULT FILE (OLD)
GENERATE RESULT FILE
-- it should continue checking every 1 minute if admin is still running cube refresh job, so I hope it will go back to while condition
else
done
else
-- Admin is running something
sleep 60
REMOVE RESULT FILE (OLD)
GENERATE RESULT FILE
-it should check result file again but I think it will finish loop
done
You can replace goto with a loop. while loop, for example.
Syntax
while <condition>
do
action
done
Check out cron jobs. Delegate, if possible, "waiting for a minute" task to cron. Cron should worry about running your script on a timely fashion.
You may consider writing two scripts instead of one.
Do you really need to create a result file? Do you know piping ? (no offense, just mentioning it because you said you were fairly new to bash)
Hopefully this is self explanatory.
result_file=admin_jobs_result_file.log
function generate {
logmsg sleeping
sleep 60
rm -f "$result_file"
logmsg generating
# use CM to generate result file
}
function owner_is_administrator {
# if line 8 contains "Owner = Administrator", exit success
# else exit failure
sed -n '8 {/Owner = Administrator/ q 0; q 1}' "$result_file"
}
function logmsg { date "+%Y-%m-%d %T -- $*"; }
##############
generate
while owner_is_administrator; do
generate
done
# at this point, line 8 does NOT contain "Owner = Administrator"
logmsg start cube refresh
# use CM to start cube refresh
generate
while owner_is_administrator; do
generate
done
logmsg Done
Looks like AIX's sed can't exit with a specified status. Try this instead:
function owner_is_administrator {
# if line 8 contains "Owner = Administrator", exit success
# else exit failure
awk 'NR == 8 {if (/Owner = Administrator/) {exit 0} else {exit 1}}' "$result_file"
}

Resources