I try to use shadow rules on this rule. If I comment the shadow row no there is no error.
rule Strelka_run:
input:
"mapped_reads/merged_samples{case}.sorted.dup.reca.bam",
"mapped_reads/merged_samples{case}.sorted.dup.reca.bam",
"Results/manta/{case}/results/variants/candidateSmallIndels.vcf.gz",
shadow: "shallow"
output:
"Results/strelka/{case}/",
"Results/strelka/{case}/results/variants/somatic.snvs.vcf.gz"
params:
genome=config['reference']['genome_fasta'],
target= config['bed_target']['custom']
log:
"log/logss.strelka.txt"
conda:
"envs/strelka.yaml"
shell:
"""
configureStrelkaSomaticWorkflow.py --normalBam {input[1]} --tumorBam {input[0]} --referenceFasta {params.genome} --indelCandidates {input[2]} --runDir {output[0]} --exome && {output[0]}/runWorkflow.py -m local
"""
I have this error
RuleException in line 4 of rules/strelka.rules:
Could not resolve wildcards in rule Strelka_run:
case
Related
This might be a very basic question for you guys, however, I am have just started with nextflow and I struggling with the simplest example.
I first explain what I have done and the problem.
Aim: I aim to make a workflow for my bioinformatics analyses as the one here (https://www.nextflow.io/example4.html)
Background: I have installed all the packages that were needed and they all work from the console without any error.
My run: I have used the same script as in example only by replacing the directory names. Here is how I have arranged the directories
location of script
~/raman/nflow/script.nf
location of Fastq files
~/raman/nflow/Data/T4_1.fq.gz
~/raman/nflow/Data/T4_2.fq.gz
Location of transcriptomic file
~/raman/nflow/Genome/trans.fa
The script
#!/usr/bin/env nextflow
/*
* The following pipeline parameters specify the refence genomes
* and read pairs and can be provided as command line options
*/
params.reads = "$baseDir/Data/T4_{1,2}.fq.gz"
params.transcriptome = "$baseDir/HumanGenome/SalmonIndex/gencode.v42.transcripts.fa"
params.outdir = "results"
workflow {
read_pairs_ch = channel.fromFilePairs( params.reads, checkIfExists: true )
INDEX(params.transcriptome)
FASTQC(read_pairs_ch)
QUANT(INDEX.out, read_pairs_ch)
}
process INDEX {
tag "$transcriptome.simpleName"
input:
path transcriptome
output:
path 'index'
script:
"""
salmon index --threads $task.cpus -t $transcriptome -i index
"""
}
process FASTQC {
tag "FASTQC on $sample_id"
publishDir params.outdir
input:
tuple val(sample_id), path(reads)
output:
path "fastqc_${sample_id}_logs"
script:
"""
fastqc "$sample_id" "$reads"
"""
}
process QUANT {
tag "$pair_id"
publishDir params.outdir
input:
path index
tuple val(pair_id), path(reads)
output:
path pair_id
script:
"""
salmon quant --threads $task.cpus --libType=U -i $index -1 ${reads[0]} -2 ${reads[1]} -o $pair_id
"""
}
Output:
(base) ntr#ser:~/raman/nflow$ nextflow script.nf
N E X T F L O W ~ version 22.10.1
Launching `script.nf` [modest_meninsky] DSL2 - revision: 032a643b56
executor > local (2)
executor > local (2)
[- ] process > INDEX (gencode) -
[28/02cde5] process > FASTQC (FASTQC on T4) [100%] 1 of 1, failed: 1 ✘
[- ] process > QUANT -
Error executing process > 'FASTQC (FASTQC on T4)'
Caused by:
Missing output file(s) `fastqc_T4_logs` expected by process `FASTQC (FASTQC on T4)`
Command executed:
fastqc "T4" "T4_1.fq.gz T4_2.fq.gz"
Command exit status:
0
Command output:
(empty)
Command error:
Skipping 'T4' which didn't exist, or couldn't be read
Skipping 'T4_1.fq.gz T4_2.fq.gz' which didn't exist, or couldn't be read
Work dir:
/home/ruby/raman/nflow/work/28/02cde5184f4accf9a05bc2ded29c50
Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
I believe I have an issue with my baseDir understanding. I am assuming that the baseDir is the one where I have my file script.nf I am not sure what is going wrong and how can I fix it.
Could anyone please help or guide.
Thank you
Caused by:
Missing output file(s) `fastqc_T4_logs` expected by process `FASTQC (FASTQC on T4)`
Nextflow complains when it can't find the declared output files. This can occur even if the command completes successfully, i.e. with exit status 0. The problem here is that fastqc simply skips files that don't exist or can't be read (e.g. permissions problems), but it does produce these warnings:
Skipping 'T4' which didn't exist, or couldn't be read
Skipping 'T4_1.fq.gz T4_2.fq.gz' which didn't exist, or couldn't be read
The solution is to just make sure all files exist. Note that the fromFilePairs factory method produces a list of files in the second element. Therefore quoting a space-separated pair of filenames is also problematic. All you need is:
script:
"""
fastqc ${reads}
"""
I have a Makefile in which I need to pass an image name to another script but the issue is, Image Name has : and - in its line for eg: image.name-v1:latest. As per google make file having : in variable causes issue. How to resolve this issue in Makefile. Below is the sample code that I am trying in which IMAGE has image.name-v1:latest
IMAGE2 ?= ''
.PHONY: image
image:
IMAGE2 := $(subst :,\:,$(IMAGE_R)) ## IMAGE_R is a run time variable for make target
rr/image.sh $(IMAGE2)
Error:
IMAGE2 := docker-rs\:latest ## Test Image using Image Reference
/bin/sh: 1: IMAGE2: not found
Makefile:75: recipe for target 'image' failed
make: *** [image] Error 127
SHELL SCRIPT: image.sh
#!/bin/bash
set -ex
IMAGE="$1"
echo "Image: $IMAGE"
You could try escaping the colon like this:
# Mock IMAGE with colon in name
IMAGE := test:123
# Create new variable adding an escapeing slash
IMAGE2 := $(subst :,\:,$(IMAGE))
# Use IMAGE2 from here on
$(info IMAGE $(IMAGE))
$(info IMAGE2 $(IMAGE2))
.PHONY: xyz-pp
xyz-pp: $(IMAGE2)
.PHONY: $(IMAGE2)
$(IMAGE2):
#echo "1234" > $(IMAGE2)
#ls
#rr/image.sh $(IMAGE2)
I added a script in sub folder rr:
echo "------------- SCRIPT ----------"
echo PARAM: $1
echo $(find -name "$1")
echo "------------- SCRIPT ----------"
output:
> make
IMAGE test:123
IMAGE2 test\:123
makefile rr test:123
------------- SCRIPT ----------
PARAM: test:123
./test:123
------------- SCRIPT ----------
So here I just replace ":" with ":" into a variable IMAGE2 and then use that.
Note: I tested this by just echo'ing your command (since I don't have that file) and touch'ing IMAGE2 to create the file
I am new to Snakemake and I am trying to develop some pipelines. I am encountering some problems when I use wildcards, trying to automate my bioinformatic analyses as much as possible. I run into troubles when the pipeline becomes more complex (as shown below). It looks like Snakemake does not resolve the wildcards correctly. During a dry run of the Snakefile, the wildcards values look correct in the executions of some rules. However, the same wildcards lead to an error in a different step(rule) of the pipeline, and I cannot figure out why. Below I provide the code and the output message of a dry run.
num=["327905-LR-41624_normal","327907-LR-41624_tumor"]
num_normal=["327905-LR-41624"]
num_tumor=["327907-LR-41624"]
path="/path/to/Snakemake/"
genome="/path/to/references_genome/Mus_musculus.GRCm38.dna_rm.toplevel.fa"
rule all:
input:
expand("/path/to/Snakemake/AS-{num_tum}_tumor_no_dupl_sort_RG_LB.bam",num_tum=num_tumor),
expand("/path/to/Snakemake/AS-{num_norm}_normal_no_dupl_sort_RG_LB.bam",num_norm=num_normal)
ruleorder: samtools_sort > remove_duplicates > samtools_index #> add_readgroup_tumor > add_readgroup_normal
rule trim_galore:
input:
r1="/path/to/Snakemake/AS-{num}_R1.fastq",
r2="/path/to/Snakemake/AS-{num}_R2.fastq"
output:
"/path/to/Snakemake/AS-{num }_R1_val_1.fq",
"/path/to/Snakemake/AS-{num }_R2_val_2.fq"
shell:
"module load trim-galore/0.5.0 ; module load pypy/2.7-6.0.0 ; trim_galore --output_dir /path/to/Snakemake/ --paired {input.r1} {input.r2} "
rule bwa_mem:
input:
R1="/path/to/Snakemake/AS-{num}_R1_val_1.fq",
R2="/path/to/Snakemake/AS-{num}_R2_val_2.fq"
output:
"/path/to/Snakemake/AS-{num}.bam"
shell:
"module load samtools/default ; module load bwa/0.7.8 ; bwa mem {genome} {input.R1} {input.R2} | samtools view -h -b > {output} "
rule samtools_sort:
input:
"/path/to/Snakemake/AS-{num}.bam"
output:
"/path/to/Snakemake/AS-{num}_sort.bam"
shell:
"module load samtools/default ; samtools sort -n -O BAM {input} > {output} "
rule remove_duplicates:
input:
"/path/to/Snakemake/AS-{num}_sort.bam"
output:
outbam="/path/to/Snakemake/AS-{num}_no_dupl_sort.bam",
metrics="/path/to/Snakemake/AS-{num}_dupl_metrics.txt"
shell:
"module load gatk/4.0.9.0 ; gatk MarkDuplicates -I {input} -O {output.outbam} -M {output.metrics} --REMOVE_DUPLICATES=true "
rule samtools_index:
input:
"/path/to/Snakemake/AS-{num}_no_dupl_sort.bam"
output:
"/path/to/Snakemake/AS-{num}_no_dupl_sort.bam.bai"
shell:
"module load samtools/default ; samtools index {input} "
rule add_readgroup_normal:
input:
"/path/to/Snakemake/AS-{num_normal}_normal_no_dupl_sort.bam"
output:
"/path/to/Snakemake/AS-{num_normal}_normal_no_dupl_sort_RG_LB.bam"
shell:
"module load gatk/4.0.9.0 ; gatk AddOrReplaceReadGroups -PL Illumina -LB { num_normal } -PU { num_normal } -SM NORMAL -I { input } -O {output} "
rule add_readgroup_tumor:
input:
"/path/to/Snakemake/AS-{num_tumor}_tumor_no_dupl_sort.bam"
output:
"/path/to/Snakemake/AS-{num_tumor}_tumor_no_dupl_sort_RG_LB.bam"
shell:
"module load gatk/4.0.9.0 ; gatk AddOrReplaceReadGroups -PL Illumina -LB { num_tumor } -PU { num_tumor } -SM TUMOR -I { input } -O {output} "
When I test the Snakefile with the command:
.local/bin/snakemake -s Snakefile_pipeline --dryrun
I get the following:
**Building DAG of jobs...**
**Job counts:**
**count jobs
1 add_readgroup_normal
1 add_readgroup_tumor
1 all
2 bwa_mem
2 remove_duplicates
2 samtools_sort
2 trim_galore
11**
**[Mon Apr 8 16:14:27 2019]
rule trim_galore:
input: /path/to/Snakemake/AS-327907-LR-41624_tumor_R1.fastq, /path/to/Snakemake/AS-327907-LR-41624_tumor_R2.fastq
output: /path/to/Snakemake/AS-327907-LR-41624_tumor_R1_val_1.fq, /path/to/Snakemake/AS-327907-LR-41624_tumor_R2_val_2.fq
jobid: 9
wildcards: num=327907-LR-41624_tumor**
**[Mon Apr 8 16:14:27 2019]
rule trim_galore:
input: /path/to/Snakemake/AS-327905-LR-41624_normal_R1.fastq, /path/to/Snakemake/AS-327905-LR-41624_normal_R2.fastq
output: /path/to/Snakemake/AS-327905-LR-41624_normal_R1_val_1.fq, /path/to/Snakemake/AS-327905-LR-41624_normal_R2_val_2.fq
jobid: 10
wildcards: num=327905-LR-41624_normal**
**[Mon Apr 8 16:14:27 2019]
rule bwa_mem:
input: /path/to/Snakemake/AS-327905-LR-41624_normal_R1_val_1.fq, /path/to/Snakemake/AS-327905-LR-41624_normal_R2_val_2.fq
output: /path/to/Snakemake/AS-327905-LR-41624_normal.bam
jobid: 8
wildcards: num=327905-LR-41624_normal**
**[Mon Apr 8 16:14:27 2019]
rule bwa_mem:
input: /path/to/Snakemake/AS-327907-LR-41624_tumor_R1_val_1.fq, /path/to/Snakemake/AS-327907-LR-41624_tumor_R2_val_2.fq
output: /path/to/Snakemake/AS-327907-LR-41624_tumor.bam
jobid: 7
wildcards: num=327907-LR-41624_tumor**
**[Mon Apr 8 16:14:27 2019]
rule samtools_sort:
input: /path/to/Snakemake/AS-327907-LR-41624_tumor.bam
output: /path/to/Snakemake/AS-327907-LR-41624_tumor_sort.bam
jobid: 5
wildcards: num=327907-LR-41624_tumor**
**[Mon Apr 8 16:14:27 2019]
rule samtools_sort:
input: /path/to/Snakemake/AS-327905-LR-41624_normal.bam
output: /path/to/Snakemake/AS-327905-LR-41624_normal_sort.bam
jobid: 6
wildcards: num=327905-LR-41624_normal**
**[Mon Apr 8 16:14:27 2019]
rule remove_duplicates:
input: /path/to/Snakemake/AS-327907-LR-41624_tumor_sort.bam
output: /path/to/Snakemake/AS-327907-LR-41624_tumor_no_dupl_sort.bam, /path/to/Snakemake/AS-327907-LR-41624_tumor_dupl_metrics.txt
jobid: 3
wildcards: num=327907-LR-41624_tumor**
**[Mon Apr 8 16:14:27 2019]
rule remove_duplicates:
input: /path/to/Snakemake/AS-327905-LR-41624_normal_sort.bam
output: /path/to/Snakemake/AS-327905-LR-41624_normal_no_dupl_sort.bam, /path/to/Snakemake/AS-327905-LR-41624_normal_dupl_metrics.txt
jobid: 4
wildcards: num=327905-LR-41624_normal**
**[Mon Apr 8 16:14:27 2019]
rule add_readgroup_normal:
input: /path/to/Snakemake/AS-327905-LR-41624_normal_no_dupl_sort.bam
output: /path/to/Snakemake/AS-327905-LR-41624_normal_no_dupl_sort_RG_LB.bam
jobid: 2
wildcards: num_normal=327905-LR-41624**
**RuleException in line 93 of /home/l136n/Snakefile_mapping_snv_call_pipeline2:
NameError: The name ' num_normal ' is unknown in this context. Please make sure that you defined that variable. Also note that braces not used for variable access have to be escaped by repeating them, i.e. {{print $1}}**
I have googled the error but found little help. Also, I double checked the pipeline for any incosistency. What I expect as output is indicated in the rule "all". The rules "add_readgroup_normal" and "add_readgroup_tumor" are supposed to take different subsets of input files, generated by the previous steps, which are run on all files. I wonder if the problem arises somehow because of this separation into 2 subsets.
I repeat that I am quite new to Snakemake, so I might be missing something silly somewhere! Any help would be really appreciated, as I am completely stuck!
Thank you so much in advance!
num=["327905-LR-41624_normal","327907-LR-41624_tumor"]
normal=["327905-LR-41624_normal"]
num_tumor=["327907-LR-41624_tumor"]
path="/path/to/Snakemake/"
genome="/icgc/dkfzlsdf/analysis/B210/references_genome/Mus_musculus.GRCm38.dna_rm.toplevel.fa"
rule all:
input:
"/path/to/Snakemake/AS-327905-LR-41624_normal_R1_val_1.fq",
"/path/to/Snakemake/AS-327905-LR-41624_normal_R2_val_2.fq",
"/path/to/Snakemake/AS-327907-LR-41624_tumor_R1_val_1.fq",
"/path/to/Snakemake/AS-327907-LR-41624_tumor_R2_val_2.fq",
"/path/to/Snakemake/AS-327905-LR-41624_normal_no_dupl_sort.bam.bai",
"/path/to/Snakemake/AS-327907-LR-41624_tumor_no_dupl_sort.bam.bai",
"/path/to/Snakemake/AS-327905-LR-41624_normal_RG.bam"
"/path/to/Snakemake/AS-327907-LR-41624_tumor_RG.bam"
rule trim_galore:
input:
r1="/path/to/Snakemake/AS-{num}_R1.fastq",
r2="/path/to/Snakemake/AS-{num}_R2.fastq"
output:
"/path/to/Snakemake/AS-{num }_R1_val_1.fq",
"/path/to/Snakemake/AS-{num }_R2_val_2.fq"
shell:
"module load trim-galore/0.5.0 ; module load pypy/2.7-6.0.0 ; trim_galore --output_dir /path/to/Snakemake/ --paired {input.r1} {input.r2} "
rule bwa_mem:
input:
R1="/path/to/Snakemake/AS-{num}_R1_val_1.fq",
R2="/path/to/Snakemake/AS-{num}_R2_val_2.fq"
output:
"/path/to/Snakemake/AS-{num}.bam"
shell:
"module load samtools/default ; module load bwa/0.7.8 ; bwa mem {genome} {input.R1} {input.R2} | samtools view -h -b > {output} "
rule samtools_sort:
input:
"/path/to/Snakemake/AS-{num}.bam"
output:
"/path/to/Snakemake/AS-{num}_sort.bam"
shell:
"module load samtools/default ; samtools sort -n -O BAM {input} > {output} "
rule remove_duplicates:
input:
"/path/to/Snakemake/AS-{num}_sort.bam"
output:
outbam="/path/to/Snakemake/AS-{num}_no_dupl_sort.bam",
metrics="/path/to/Snakemake/AS-{num}_dupl_metrics.txt"
shell:
"module load gatk/4.0.9.0 ; gatk MarkDuplicates -I {input} -O {output.outbam} -M {output.metrics} --REMOVE_DUPLICATES=true "
rule samtools_index:
input:
"/path/to/Snakemake/AS-{num}_no_dupl_sort.bam"
output:
"/path/to/Snakemake/AS-{num}_no_dupl_sort.bam.bai"
shell:
"module load samtools/default ; samtools index {input} "
rule add_readgroup_normal:
input:
"/path/to/Snakemake/AS-{normal}_no_dupl_sort.bam"
output:
"/path/to/Snakemake/AS-{normal}_RG.bam"
shell:
"module load gatk/4.0.9.0 ; gatk AddOrReplaceReadGroups -PL Illumina -LB { wildcards.normal } -PU { wildcards.normal } -SM NORMAL -I { input } -O {output} "
rule add_readgroup_tumor:
input:
"/path/to/Snakemake/AS-{num}_no_dupl_sort.bam"
output:
"/path/to/Snakemake/AS-{num_,'.*tumor.*'}_RG.bam"
shell:
"module load gatk/4.0.9.0 ; gatk AddOrReplaceReadGroups -PL Illumina -LB { wildcards.num } -PU { wildcards.num } -SM TUMOR -I { input } -O {output} "
Error:
Building DAG of jobs...
MissingInputException in line 37 of /home/l136n/Snakefile_mapping_snv_call_pipeline2b1:
Missing input files for rule trim_galore:
/path/to/Luca/Snakemake/AS-327905-LR-41624_normal_RG.bam/path/to/Luca/Snakemake/AS-327907-LR-41624_tumor_RG_R1.fastq
/path/to/Snakemake/AS-327905-LR-41624_normal_RG.bam/path/to/Luca/Snakemake/AS-327907-LR-41624_tumor_RG_R2.fastq
Wildcards are accessible in shell using syntax {wilcards.var}, not {var}. You have the latter in rule add_readgroup_normal.
Source.
I thought I would provide the solution, even if the post is a bit old now. The error was simply due to the presence of spaces inside "{ wildcards.var }".
I want to use gatk recalibration using pair sample ( tumor and normal). I need to parse the data using pandas. That is what I wroted.
expand("mapped_reads/merged_samples/{sample[1][tumor]}/{sample[1][tumor]}_{sample[1][normal]}.bam", sample=read_table(config["conditions"], ",").iterrows())
this is the condition file:
432,433
434,435
I wrote this rule:
rule gatk_RealignerTargetCreator:
input:
"mapped_reads/merged_samples/{tumor}.sorted.dup.reca.bam",
"mapped_reads/merged_samples/{normal}.sorted.dup.reca.bam",
output:
"mapped_reads/merged_samples/{tumor}/{tumor}_{normal}.realign.intervals"
params:
genome=config['reference']['genome_fasta'],
mills= config['mills'],
ph1_indels= config['know_phy'],
log:
"mapped_reads/merged_samples/logs/{tumor}_{normal}.realign_info.log"
threads: 8
shell:
"gatk -T RealignerTargetCreator -R {params.genome} {params.custom} "
"-nt {threads} "
"-I {wildcard.tumor} -I {wildcard.normal} -known {params.ph1_indels} "
"-o {output} >& {log}"
I have this error:
InputFunctionException in line 17 of /home/maurizio/Desktop/TEST_exome/rules/samfiles.rules:
KeyError: '432/432_433'
Wildcards:
sample=432/432_433
this is the samfiles.rules:
rule samtools_merge_bam:
"""
Merge bam files for multiple units into one for the given sample.
If the sample has only one unit, files will be copied.
"""
input:
lambda wildcards: expand("mapped_reads/bam/{unit}_sorted.bam",unit=config["samples"][wildcards.sample])
output:
"mapped_reads/merged_samples/{sample}.bam"
benchmark:
"benchmarks/samtools/merge/{sample}.txt"
run:
if len(input) > 1:
shell("/illumina/software/PROG2/samtools-1.3.1/samtools merge {output} {input}")
else:
shell("cp {input} {output} && touch -h {output}")
I can only guess because you don't show all relevant rule, but I would say the error occurs because the rule samtools_merge_bam also applies to some later bam file where you have the pattern {tumor}/{tumor}_{normal}...
As a solution, you have to resolve this ambiguity (see the snakemake tutorial). For example, you can constrain the wildcard of samtools_merge_bam to not contain any slashes.
wildcard_constraints:
sample="[^/]+"
You can put the constraint either globally or inside your samtools_merge_bam rule.
I've followed the simple firefox build instructions, but the following error occured:
Gabor#AMANDA /c/dev/mozilla-central
$ ./mach build
0:00.36 C:/mozilla-build/msys/bin/sh.exe -c c:/mozilla-build/python/python.exe
c:/dev/mozilla-central/build/pymake/make.py -f client.mk -s
0:03.79 Adding client.mk options from c:/dev/mozilla-central/.mozconfig:
0:03.79 MOZ_OBJDIR=$(TOPSRCDIR)/obj-ff
0:03.79 FOUND_MOZCONFIG := c:/dev/mozilla-central/.mozconfig
0:04.83 c:\dev\mozilla-central\obj-ff\backend.RecursiveMakeBackend.pp:1:0:Targe
t 'backend.RecursiveMakeBackend' does not match the static pattern 'c:/C'
0:04.83 c:\dev\mozilla-central\client.mk:398:0: command 'c:/mozilla-build/pytho
n/python.exe c:/dev/mozilla-central/build/pymake/pymake/../make.py -j4 -C c:/dev
/mozilla-central/obj-ff' failed, return code 2
0:04.83 c:\dev\mozilla-central\client.mk:185:0: command 'c:/mozilla-build/pytho
n/python.exe c:/dev/mozilla-central/build/pymake/pymake/../make.py -f c:/dev/moz
illa-central/client.mk realbuild' failed, return code 2
0:04.95 0 compiler warnings present.
2
Anyone knows what can be the problem?
Mozconfig only contains this:
mk_add_options MOZ_OBJDIR=#TOPSRCDIR#/obj-ff