New to nextflow, attempted to run a loop in nextflow chunk to remove extension from sequence file names and am running into a syntax error.
params.rename = "sequences/*.fastq.gz"
workflow {
rename_ch = Channel.fromPath(params.rename)
RENAME(rename_ch)
RENAME.out.view()
}
process RENAME {
input:
path read
output:
stdout
script:
"""
for file in $baseDir/sequences/*.fastq.gz;
do
mv -- '$file' '${file%%.fastq.gz}'
done
"""
}
Error:
- cause: Unexpected input: '{' # line 25, column 16.
process RENAME {
^
Tried to use other methods such as basename, but to no avail.
Inside a script block, you just need to escape the Bash dollar-variables and use double quotes so that they can expand. For example:
params.rename = "sequences/*.fastq.gz"
workflow {
RENAME()
}
process RENAME {
debug true
"""
for fastq in ${baseDir}/sequences/*.fastq.gz;
do
echo mv -- "\$fastq" "\${fastq%%.fastq.gz}"
done
"""
}
Results:
$ nextflow run main.nf
N E X T F L O W ~ version 22.04.0
Launching `main.nf` [crazy_brown] DSL2 - revision: 71ada7b0d5
executor > local (1)
[71/4321e6] process > RENAME [100%] 1 of 1 ✔
mv -- /path/to/sequences/A.fastq.gz /path/to/sequences/A
mv -- /path/to/sequences/B.fastq.gz /path/to/sequences/B
mv -- /path/to/sequences/C.fastq.gz /path/to/sequences/C
Also, if you find escaping the Bash variables tedious, you may want to consider using a shell block instead.
I have something like this on a Jenkinsfile (Groovy) and I want to record the stdout and the exit code in a variable in order to use the information later.
sh "ls -l"
How can I do this, especially as it seems that you cannot really run any kind of groovy code inside the Jenkinsfile?
The latest version of the pipeline sh step allows you to do the following;
// Git committer email
GIT_COMMIT_EMAIL = sh (
script: 'git --no-pager show -s --format=\'%ae\'',
returnStdout: true
).trim()
echo "Git committer email: ${GIT_COMMIT_EMAIL}"
Another feature is the returnStatus option.
// Test commit message for flags
BUILD_FULL = sh (
script: "git log -1 --pretty=%B | grep '\\[jenkins-full]'",
returnStatus: true
) == 0
echo "Build full flag: ${BUILD_FULL}"
These options where added based on this issue.
See official documentation for the sh command.
For declarative pipelines (see comments), you need to wrap code into script step:
script {
GIT_COMMIT_EMAIL = sh (
script: 'git --no-pager show -s --format=\'%ae\'',
returnStdout: true
).trim()
echo "Git committer email: ${GIT_COMMIT_EMAIL}"
}
Current Pipeline version natively supports returnStdout and returnStatus, which make it possible to get output or status from sh/bat steps.
An example:
def ret = sh(script: 'uname', returnStdout: true)
println ret
An official documentation.
quick answer is this:
sh "ls -l > commandResult"
result = readFile('commandResult').trim()
I think there exist a feature request to be able to get the result of sh step, but as far as I know, currently there is no other option.
EDIT: JENKINS-26133
EDIT2: Not quite sure since what version, but sh/bat steps now can return the std output, simply:
def output = sh returnStdout: true, script: 'ls -l'
If you want to get the stdout AND know whether the command succeeded or not, just use returnStdout and wrap it in an exception handler:
scripted pipeline
try {
// Fails with non-zero exit if dir1 does not exist
def dir1 = sh(script:'ls -la dir1', returnStdout:true).trim()
} catch (Exception ex) {
println("Unable to read dir1: ${ex}")
}
output:
[Pipeline] sh
[Test-Pipeline] Running shell script
+ ls -la dir1
ls: cannot access dir1: No such file or directory
[Pipeline] echo
unable to read dir1: hudson.AbortException: script returned exit code 2
Unfortunately hudson.AbortException is missing any useful method to obtain that exit status, so if the actual value is required you'd need to parse it out of the message (ugh!)
Contrary to the Javadoc https://javadoc.jenkins-ci.org/hudson/AbortException.html the build is not failed when this exception is caught. It fails when it's not caught!
Update:
If you also want the STDERR output from the shell command, Jenkins unfortunately fails to properly support that common use-case. A 2017 ticket JENKINS-44930 is stuck in a state of opinionated ping-pong whilst making no progress towards a solution - please consider adding your upvote to it.
As to a solution now, there could be a couple of possible approaches:
a) Redirect STDERR to STDOUT 2>&1
- but it's then up to you to parse that out of the main output though, and you won't get the output if the command failed - because you're in the exception handler.
b) redirect STDERR to a temporary file (the name of which you prepare earlier) 2>filename (but remember to clean up the file afterwards) - ie. main code becomes:
def stderrfile = 'stderr.out'
try {
def dir1 = sh(script:"ls -la dir1 2>${stderrfile}", returnStdout:true).trim()
} catch (Exception ex) {
def errmsg = readFile(stderrfile)
println("Unable to read dir1: ${ex} - ${errmsg}")
}
c) Go the other way, set returnStatus=true instead, dispense with the exception handler and always capture output to a file, ie:
def outfile = 'stdout.out'
def status = sh(script:"ls -la dir1 >${outfile} 2>&1", returnStatus:true)
def output = readFile(outfile).trim()
if (status == 0) {
// output is directory listing from stdout
} else {
// output is error message from stderr
}
Caveat: the above code is Unix/Linux-specific - Windows requires completely different shell commands.
this is a sample case, which will make sense I believe!
node('master'){
stage('stage1'){
def commit = sh (returnStdout: true, script: '''echo hi
echo bye | grep -o "e"
date
echo lol''').split()
echo "${commit[-1]} "
}
}
For those who need to use the output in subsequent shell commands, rather than groovy, something like this example could be done:
stage('Show Files') {
environment {
MY_FILES = sh(script: 'cd mydir && ls -l', returnStdout: true)
}
steps {
sh '''
echo "$MY_FILES"
'''
}
}
I found the examples on code maven to be quite useful.
All the above method will work. but to use the var as env variable inside your code you need to export the var first.
script{
sh " 'shell command here' > command"
command_var = readFile('command').trim()
sh "export command_var=$command_var"
}
replace the shell command with the command of your choice. Now if you are using python code you can just specify os.getenv("command_var") that will return the output of the shell command executed previously.
How to read the shell variable in groovy / how to assign shell return value to groovy variable.
Requirement : Open a text file read the lines using shell and store the value in groovy and get the parameter for each line .
Here , is delimiter
Ex: releaseModule.txt
./APP_TSBASE/app/team/i-home/deployments/ip-cc.war/cs_workflowReport.jar,configurable-wf-report,94,23crb1,artifact
./APP_TSBASE/app/team/i-home/deployments/ip.war/cs_workflowReport.jar,configurable-temppweb-report,394,rvu3crb1,artifact
========================
Here want to get module name 2nd Parameter (configurable-wf-report) , build no 3rd Parameter (94), commit id 4th (23crb1)
def module = sh(script: """awk -F',' '{ print \$2 "," \$3 "," \$4 }' releaseModules.txt | sort -u """, returnStdout: true).trim()
echo module
List lines = module.split( '\n' ).findAll { !it.startsWith( ',' ) }
def buildid
def Modname
lines.each {
List det1 = it.split(',')
buildid=det1[1].trim()
Modname = det1[0].trim()
tag= det1[2].trim()
echo Modname
echo buildid
echo tag
}
If you don't have a single sh command but a block of sh commands, returnstdout wont work then.
I had a similar issue where I applied something which is not a clean way of doing this but eventually it worked and served the purpose.
Solution -
In the shell block , echo the value and add it into some file.
Outside the shell block and inside the script block , read this file ,trim it and assign it to any local/params/environment variable.
example -
steps {
script {
sh '''
echo $PATH>path.txt
// I am using '>' because I want to create a new file every time to get the newest value of PATH
'''
path = readFile(file: 'path.txt')
path = path.trim() //local groovy variable assignment
//One can assign these values to env and params as below -
env.PATH = path //if you want to assign it to env var
params.PATH = path //if you want to assign it to params var
}
}
Easiest way is use this way
my_var=`echo 2`
echo $my_var
output
: 2
note that is not simple single quote is back quote ( ` ).
I'm currently making a start on using Nextflow to develop a bioinformatics pipeline. Below, I've created a params.files variable which contains my FASTQ files, and then input this into fasta_files channel.
The process trimming and its scripts takes this channel as the input, and then ideally, I would output all the $sample".trimmed.fq.gz into the output channel, trimmed_channel. However, when I run this script, I get the following error:
Missing output file(s) `trimmed_files` expected by process `trimming` (1)
The nextflow script I'm trying to run is:
#! /usr/bin/env nextflow
params.files = files("$baseDir/FASTQ/*.fastq.gz")
println "fastq files for trimming:$params.files"
fasta_files = Channel.fromPath(params.files)
println "files in the fasta channel: $fasta_files"
process trimming {
input:
file fasta_file from fasta_files
output:
path trimmed_files into trimmed_channel
// the shell script to be run:
"""
#!/usr/bin/env bash
mkdir trimming_report
cd /home/usr/Nextflow
#Finding and renaming my FASTQ files
for file in FASTQ/*.fastq.gz; do
[ -f "\$file" ] || continue
name=\$(echo "\$file" | awk -F'[/]' '{ print \$2 }') #renaming fastq files.
sample=\$(echo "\$name" | awk -F'[.]' '{ print \$1 }') #renaming fastq files.
echo "Found" "\$name" "from:" "\$sample"
if [ ! -e FASTQ/"\$sample"_trimmed.fq.gz ]; then
trim_galore -j 8 "\$file" -o FASTQ #trim the files
mv "\$file"_trimming_report.txt trimming_report #moves to the directory trimming report
else
echo ""\$sample".trimmed.fq.gz exists skipping trim galore"
fi
done
trimmed_files="FASTQ/*_trimmed.fq.gz"
echo \$trimmed_files
"""
}
The script in the process works fine. However, I'm wondering if I'm misunderstanding or missing something obvious. If I've forgot to include something, please let me know and any help is appreciated!
Nextflow does not export the variable trimmed_files to its own scope unless you tell it to do so using the env output qualifier, however doing it that way would not be very idiomatic.
Since you know the pattern of your output files ("FASTQ/*_trimmed.fq.gz"), simply pass that pattern as output:
path "FASTQ/*_trimmed.fq.gz" into trimmed_channel
Some things you do, but probably want to avoid:
Changing directory inside your NF process, don't do this, it entirely breaks the whole concept of nextflow's /work folder setup.
Write a bash loop inside a NF process, if you set up your channels correctly there should only be 1 task per spawned process.
Pallie has already provided some sound advice and, of course, the right answer, which is: environment variables must be declared using the env qualifier.
However, given your script definition, I think there might be some misunderstanding about how best to skip the execution of previously generated results. The cache directive is enabled by default and when the pipeline is launched with the -resume option, additional attempts to execute a process using the same set of inputs, will cause the process execution to be skipped and will produce the stored data as the actual results.
This example uses the Nextflow DSL 2 for my convenience, but is not strictly required:
nextflow.enable.dsl=2
params.fastq_files = "${baseDir}/FASTQ/*.fastq.gz"
params.publish_dir = "./results"
process trim_galore {
tag { "${sample}:${fastq_file}" }
publishDir "${params.publish_dir}/TrimGalore", saveAs: { fn ->
fn.endsWith('.txt') ? "trimming_reports/${fn}" : fn
}
cpus 8
input:
tuple val(sample), path(fastq_file)
output:
tuple val(sample), path('*_trimmed.fq.gz'), emit: trimmed_fastq_files
path "${fastq_file}_trimming_report.txt", emit: trimming_report
"""
trim_galore \\
-j ${task.cpus} \\
"${fastq_file}"
"""
}
workflow {
Channel.fromPath( params.fastq_files )
| map { tuple( it.getSimpleName(), it ) }
| set { sample_fastq_files }
results = trim_galore( sample_fastq_files )
results.trimmed_fastq_files.view()
}
Run using:
nextflow run script.nf \
-ansi-log false \
--fastq_files '/home/usr/Nextflow/FASTQ/*.fastq.gz'
How do I use shell arrays in a Jenkinsfile?
My Jenkins job has a String parameter PROJECTS that is a comma-separated list of projects to build. I have a Build step in which I run some shell script to split that parameter into an array, and then pass that array to a build script:
...
stage("Build") {
steps {
sh"""
projects_list=(${env.PROJECTS//,/ })
./build_script ${projects_list[#]}
"""
}
}
...
however, the Jenkins build keeps failing due to this:
WorkflowScript: 132: unexpected token: # # line 132, column 104.
build_script ${projects_list[#]}
^
1 error
Please see the below code which gives desired result:
Please note : I am using bat command and calling shell scripts inside via cygwin as am using Windows machine.
...
def PROJECTS = "ABC,XYZ"
stage("Build") {
steps {
bat'cygwin.bat -c \"projects_list=(${PROJECTS//,/ }); ./buildscript.sh ${projects_list[#]} \"'
}
}
...
cygwin.bat
IF [%1] == [-c] (
C:\Cygwin\bin\bash.exe -l -i %*
) ELSE (
startC:\Cygwin\bin\mintty.exe --exec C:\Cygwin\bin\bash.exe -l -i
)
With sh: The syntax would be same, just use sh rather than bat and call the command without cywgin.bat -c
I want to use gatk recalibration using pair sample ( tumor and normal). I need to parse the data using pandas. That is what I wroted.
expand("mapped_reads/merged_samples/{sample[1][tumor]}/{sample[1][tumor]}_{sample[1][normal]}.bam", sample=read_table(config["conditions"], ",").iterrows())
this is the condition file:
432,433
434,435
I wrote this rule:
rule gatk_RealignerTargetCreator:
input:
"mapped_reads/merged_samples/{tumor}.sorted.dup.reca.bam",
"mapped_reads/merged_samples/{normal}.sorted.dup.reca.bam",
output:
"mapped_reads/merged_samples/{tumor}/{tumor}_{normal}.realign.intervals"
params:
genome=config['reference']['genome_fasta'],
mills= config['mills'],
ph1_indels= config['know_phy'],
log:
"mapped_reads/merged_samples/logs/{tumor}_{normal}.realign_info.log"
threads: 8
shell:
"gatk -T RealignerTargetCreator -R {params.genome} {params.custom} "
"-nt {threads} "
"-I {wildcard.tumor} -I {wildcard.normal} -known {params.ph1_indels} "
"-o {output} >& {log}"
I have this error:
InputFunctionException in line 17 of /home/maurizio/Desktop/TEST_exome/rules/samfiles.rules:
KeyError: '432/432_433'
Wildcards:
sample=432/432_433
this is the samfiles.rules:
rule samtools_merge_bam:
"""
Merge bam files for multiple units into one for the given sample.
If the sample has only one unit, files will be copied.
"""
input:
lambda wildcards: expand("mapped_reads/bam/{unit}_sorted.bam",unit=config["samples"][wildcards.sample])
output:
"mapped_reads/merged_samples/{sample}.bam"
benchmark:
"benchmarks/samtools/merge/{sample}.txt"
run:
if len(input) > 1:
shell("/illumina/software/PROG2/samtools-1.3.1/samtools merge {output} {input}")
else:
shell("cp {input} {output} && touch -h {output}")
I can only guess because you don't show all relevant rule, but I would say the error occurs because the rule samtools_merge_bam also applies to some later bam file where you have the pattern {tumor}/{tumor}_{normal}...
As a solution, you have to resolve this ambiguity (see the snakemake tutorial). For example, you can constrain the wildcard of samtools_merge_bam to not contain any slashes.
wildcard_constraints:
sample="[^/]+"
You can put the constraint either globally or inside your samtools_merge_bam rule.