I am attempting to write a snakemake rule that will treat some files, based off their grouping differently than others. My file list is loaded in in a sample.tsv file.
I thought this would be relatively easy as I believed that populating the wildcards in the rule all would trigger execution of particular rules, but that does not appear to be the case.
Here is a paired down version of what I'm working on
List of sample files. Note that the chip category here is what becomes important for defining my groups
tissue replicate chip file
leaf rep2 input 00.data/chip_seq/input/leaf_input_rep2.fastq
leaf rep1 input 00.data/chip_seq/input/leaf_input_rep1.fastq
leaf rep2 H3K36me3 00.data/chip_seq/H3K36me3/leaf_H3K36me3_rep2.fastq
leaf rep1 H3K36me3 00.data/chip_seq/H3K36me3/leaf_H3K36me3_rep1.fastq
leaf rep1 H3K56ac 00.data/chip_seq/H3K56ac/leaf_H3K56ac_rep1.fastq
leaf rep2 H3K56ac 00.data/chip_seq/H3K56ac/leaf_H3K56ac_rep2.fastq
In my script I have then divided these into two sub-categories
broad = ['H3K36me3']
narrow = ["H3K56ac"]
rule all:
input:
#Align all reads
expand("02.unique_align/{tissue}_{chip}_{replicate}_unique_bowtie2_algn.bam", \
¦ ¦ ¦ tissue = samples['tissue'], replicate = samples['replicate'], \
¦ ¦ ¦ chip = samples['chip']),
#Should cause the expand on ONLY narrow groups, causing the below rule
# run_bcp_peak_caller_narrow to trigger
expand("03.called_peaks/{tissue}_{replicate}_{chip}_peaks_region_narrow.bed",
¦ ¦ ¦ tissue = narrow_peaks['tissue'],
¦ ¦ ¦ replicate = narrow_peaks['replicate'],
¦ ¦ ¦ chip = narrow),
#Should cause the expand on ONLY narrow groups, causing the below rule
# run_bcp_peak_caller_broad to trigger
expand("03.called_peaks/{tissue}_{replicate}_{chip}_peaks_region_broad.bed",
¦ ¦ ¦ tissue = samples['tissue'],
¦ ¦ ¦ replicate = samples['replicate'],
¦ ¦ ¦ chip = broad)
## Two functions, one to get the input files, defined here as `get_input` the other to retrieve the chip files
def get_input(wildcards):
z = glob.glob(os.path.join("02.unique_align/", (wildcards.tissue + "_" + \
¦ wildcards.replicate + "_" + "input_unique_bowtie2_algn.bam")))
return z
def get_chip(wildcards):
z = glob.glob(os.path.join("02.unique_align/", (wildcards.tissue + "_" + \
¦ ¦ ¦ wildcards.replicate + "_" + wildcards.chip + "_" + \
¦ ¦ ¦ "unique_bowtie2_algn.bam")))
return z
rule run_bcp_peak_caller_broad:
input:
¦ chip_input = get_input,
¦ chip_mod = get_chip
params:
¦ "03.called_peaks/{tissue}_{replicate}_{chip}_peaks_broad"
output:
¦ "03.called_peaks/{tissue}_{replicate}_{chip}_peaks_broad.bed"
shell:"""
peakranger bcp \
--format bam \
--verbose \
--pval .001 \
--data {input.chip_mod} \
--control {input.chip_input} \
--output {params}
"""
rule run_bcp_peak_caller_narrow:
input:
chip_input = get_input,
chip_mod = get_chip
params:
"03.called_peaks/{tissue}_{replicate}_{chip}_peaks_narrow"
output:
"03.called_peaks/{tissue}_{replicate}_{chip}_peaks_narrow.bed"
shell:"""
peakranger \
--format bam \
--verbose \
--pval .001 \
--data {input.chip_mod} \
--control {input.chip_input} \
--output {params}
"""
Error is as follows:
MissingInputException in line 39 of /scratch/jpm73279/04.lncRNA/02.Analysis/24.regenerate_expression_peaks/Generate_peak_lists.snake:
Missing input files for rule all:
03.called_peaks/root_rep1_H3K4me1_peaks_region_broad.bed
03.called_peaks/root_rep2_H3K36me3_peaks_region_broad.bed
03.called_peaks/leaf_rep1_H3K4me1_peaks_region_broad.bed
03.called_peaks/root_rep1_H3K36me3_peaks_region_broad.bed
03.called_peaks/leaf_rep2_H3K36me3_peaks_region_broad.bed
03.called_peaks/root_rep2_H3K4me1_peaks_region_broad.bed
My understanding is that snakemake populates the files combinations found in the rule all section and then identify what steps need to be run upstream.
Any help would be greatly appreciated
You are right in your understanding; when no output is specified to snakemake it will find the first rule, and try to generate its output.
The problem is that rule all specifies rules that can not be 'made'. I'll show a side-by-side comparison of the mistake:
rule all:
03.called_peaks/root_rep1_H3K4me1_peaks_region_broad.bed
rule run_bcp_peak_caller_broad:
output:
"03.called_peaks/{tissue}_{replicate}_{chip}_peaks_broad.bed"
See the difference? The files you say you want to generate end with peaks_region_broad.bed, however your rules make output ending with peaks_broad.bed.
Take a look again at rule all and probably you want to remove the _region part of the strings.
Related
I'm writing a pipeline with Snakemake and the program can't identify the rule stringtie. I can't find what I'm doing wrong. I already runned the rule fastp and star, the problem is specific with the stringtie rule.
include:
'config.py'
rule all:
input:
expand(FASTP_DIR + "{sample}R{read_no}.fastq",sample=SAMPLES ,read_no=['1', '2']), #fastp
expand(STAR_DIR + STAR_DIR + "output/{sample}/{sample}Aligned.sortedByCoord.out.bam",sample=SAMPLES), #STAR
expand(STRINGTIE_DIR + "/{sample}/{sample}Aligned.sortedByCoord.out.gtf", sample=SAMPLES),
GTF_DIR + "path_samplesGTF.txt"
rule fastp:
input:
R1= DATA_DIR + "{sample}R1_001.fastq.gz",
R2= DATA_DIR + "{sample}R2_001.fastq.gz"
output:
R1out= FASTP_DIR + "{sample}R1.fastq",
R2out= FASTP_DIR + "{sample}R2.fastq"
params:
data_dir = DATA_DIR,
name_sample = "{sample}"
log: FASTP_LOG + "{sample}.html"
message: "Executando o programa FASTP"
run:
shell('fastp -i {input.R1} -I {input.R2} -o {output.R1out} -O {output.R2out} \
-h {log} -j {log}')
shell("find {params.data_dir} -type f -name '{params.name_sample}*' -delete ")
rule star:
input:
idx_star = IDX_DIR,
R1 = FASTP_DIR + "{sample}R1.fastq",
R2 = FASTP_DIR + "{sample}R2.fastq",
parameters = "parameters.txt",
params:
outdir = STAR_DIR + "output/{sample}/{sample}",
star_dir = STAR_DIR,
star_sample = '{sample}'
# threads: 18
output:
out = STAR_DIR + "output/{sample}/{sample}Aligned.sortedByCoord.out.bam"
#run_time = STAR + "log/star_run.time"
# log: STAR_LOG
# benchmark: BENCHMARK + "star/{sample_star}"
run:
shell("STAR --runThreadN 12 --genomeDir {input.idx_star} \
--readFilesIn {input.R1} {input.R2} --outFileNamePrefix {params.outdir}\
--parametersFiles {input.parameters} \
--quantMode TranscriptomeSAM GeneCounts \
--genomeChrBinNbits 12")
# shell("find {params.star_dir} -type f ! -name
'{params.star_sample}Aligned.sortedByCoord.out.bam' -delete")
rule stringtie:
input:
star_output = STAR_DIR + "output/{sample}/{sample}Aligned.sortedByCoord.out.bam"
output:
stringtie_output = STRINGTIE_DIR + "/{sample}/{sample}Aligned.sortedByCoord.out.gtf"
run:
shell("stringtie {input.star_output} -o {output.stringtie_output} \
-v -p 12 ")
rule grep_gtf:
input:
list_gtf = STRINGTIE_DIR
output:
paths = GTF_DIR + "path_samplesGTF.txt"
shell:
"find {input.list_gtf} | grep .gtf > {output.paths}"
This is the output I get with the option dry-run (flag -n)
Building DAG of jobs...
Job counts:
count jobs
1 all
1 grep_gtf
2
[Fri Apr 17 15:59:24 2020]
rule grep_gtf:
input: /homelocal/boralli/workdir/pipeline_v4/STRINGTIE/
output: /homelocal/boralli/workdir/pipeline_v4/GTF/path_samplesGTF.txt
jobid: 1
find /homelocal/boralli/workdir/pipeline_v4/STRINGTIE/ | grep .gtf >
/homelocal/boralli/workdir/pipeline_v4/GTF/path_samplesGTF.txt
[Fri Apr 17 15:59:24 2020]
localrule all:
input: /homelocal/boralli/workdir/pipeline_v4/GTF/path_samplesGTF.txt
jobid: 0
Job counts:
count jobs
1 all
1 grep_gtf
2
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.
I really don't know whats going on. The same pipeline worked before.
In addition to Troy's comment:
You specify as input of your rule grep_gtf a directory. Since that directory probably already exists, the rule stringtie does not need to be executed before running grep_gtf.
Using a directory as input isn't really a good idea. If you need the outputs of rule stringtie before executing rule grep_gtf, i suggest you specify the output files of rule stringtie as input of rule grep_gtf.
So your rule grep_gtf should be something like:
rule grep_gtf:
input:
expand(STRINGTIE_DIR + "/{sample}/{sample}Aligned.sortedByCoord.out.gtf", sample=SAMPLES)
output:
paths = GTF_DIR + "path_samplesGTF.txt"
shell:
"find {STRINGTIE_DIR} | grep .gtf > {output.paths}"
EDIT:
I think there's a bad copy/paste in rule all where there is twice STAR_DIR:
expand(STAR_DIR + STAR_DIR + "output/{sample}/{sample}Aligned.sortedByCoord.out.bam",sample=SAMPLES), #STAR
I also think there is a misunderstanding on the snakemake "workflow" concept. You do not need to specify the outputs of all rules in rule all. You only need to specify the last file of the workflow. Snakemake will decide which rules need to be run in order to achieve the creation of the final file. I don't really understand why your snakemake does not want to build the gtf files since you ask for them in rule all but I do see why rule grep_gtf does not need the output of rule stringtie to run.
I'm currently running a snakemake checkpoint that appears to be throwing a non-zero exit code even after correct completion of the command, and am unsure how to resolve the problem.
The purpose of the below script is to parse a file of coordinates, the bed_file, extract all regions from a bam file rna_file and eventually assemble these regions. The code is below, and my snakemake version is 5.6.0.
#Pull coordinates from a BAM file, and use the command samtools view to extract the corresponding #data, naming the output as the coordinate file, here named "6:25274434-25278245.bam". There are #an unknown number of output files
checkpoint pull_reads_for_BAM:
input:
¦ bed_file = get_lncRNA_file,
¦ rna_file = get_RNA_file
conda:
¦ "envs/pydev_1.yml"
params:
¦ "01.pulled_reads"
output:
¦ directory("01.pulled_reads/{tissue}")
shell:"""
mkdir 01.pulled_reads/{wildcards.tissue}
store_regions=$(cat {input.bed_file} | awk -F'\t' '{{ print $1 ":" $2 "-" $3 }}')
for i in $store_regions ; do
¦ samtools view -b -h {input.rna_file} ${{i}} > 01.pulled_reads/{wildcards.tissue}/${{i}}.bam ;
done
echo "This completed fine"
"""
rule samtools_sort:
input:
¦ "01.pulled_reads/{tissue}/{i}.bam"
params:
¦ "{i}"
output:
¦ "01.pulled_reads/{tissue}/{i}.sorted.bam"
shell:
¦ "samtools sort -T sorted_reads/{params}.tmp {input} > {output}"
rule samtools_index:
input:
¦ "01.pulled_reads/{tissue}/{i}.sorted.bam"
output:
¦ "01.pulled_reads/{tissue}/{i}.sorted.bam.bai"
shell:
"samtools index {input}"
rule string_tie_assembly:
input:
¦ "01.pulled_reads/{tissue}/{i}.sorted.bam"
output:
¦ "02.string_tie_assembly/{tissue}/{i}_assembly.gtf"
shell:
"stringtie {input} -f 0.0 -a 0 -m 50 -c 3.0 -f 0.0 -o {output}"
def trigger_aggregate(wildcards):
checkpoint_output = checkpoints.pull_reads_for_BAM.get(**wildcards).output[0]
x = expand("02.string_tie_assembly/{tissue}/{i}_assembly.merged.gtf",
¦ tissue = wildcards.tissue,
¦ i=glob_wildcards(os.path.join(checkpoint_output, "{i}.bam")).i)
return x
#Aggregate function that triggers rule
rule combine_all_gtf_things:
input:
¦ trigger_aggregate
output:
¦ "03.final_stuff/{tissue}.merged.gtf"
shell:"""
cat {input} > {output}
"""
After the command has run to completion, snakemake returns (exited with non-zero exit code) for some mysterious reason. I can watch the output be generated in the file and it appears to be correct, so I'm unsure why it's throwing this error.
The checkpoint I have generated is modeled after this:
https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html
Related Questions that have gone unanswered:
Snakemake checkpoint (exited with non-zero exit code)
It appears that this issue was somehow caused by the wildcards in {tissue} being set as a directory. As to why this throws a non-zero exit status I am unsure. This was fixed by simply appending {tissue}_dir onto the path above.
More on the issue can be found here:
https://bitbucket.org/snakemake/snakemake/issues/1303/snakemake-checkpoint-throws-exited-with
Not sure if this is a problem but mkdir 01.pulled_reads/{wildcards.tissue} will fail if the directory exists or 01.pulled_reads does not exist before mkdir is executed.
Try adding the -p option to mkidr, i.e. mkdir -p 01.pulled_reads/{wildcards.tissue}
I've currently been running into some issues with snakemake running intermediate rules required by a checkpoint. After attempting to trouble shoot this issue, I believe the problem lies within the expand command in the aggregate_input function, but cannot figure out why it is behaving the way it is.
Here is the current checkpoint documentation from snakemake which I have modeled this after https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#data-dependent-conditional-execution
rule all:
input:
¦ expand("string_tie_assembly/{sample}.gtf", sample=sample),
¦ expand("combined_fasta/{sample}.fa", sample=sample),
¦ "aggregated_fasta/all_fastas_combined.fa"
checkpoint clustering:
input:
¦ "string_tie_assembly_merged/merged_{sample}.gtf"
output:
¦ clusters = directory("split_gtf_file/{sample}")
shell:
¦ """
¦ mkdir -p split_gtf_file/{wildcards.sample} ;
collapse_gtf_file.py -gtf {input} -o split_gtf_file/{wildcards.sample}/{wildcards.sample}
¦ """
rule gtf_to_fasta:
input:
¦ "split_gtf_file/{sample}/{sample}_{i}.gtf"
output:
¦ "lncRNA_fasta/{sample}/canidate_{sample}_{i}.fa"
shell:
¦ "gffread -w {output} -g {reference} {input}"
rule rename_fasta_files:
input:
¦ "lncRNA_fasta/{sample}/canidate_{sample}_{i}.fa"
output:
¦ "lncRNA_fasta_renamed/{sample}/{sample}_{i}.fa"
shell:
¦ "seqtk rename {input} {wildcards.sample}_{i} > {output}"
#Gather N number of output files from the GTF split
def aggregate_input(wildcards):
checkpoint_output = checkpoints.clustering.get(**wildcards).output[0]
x = expand("lncRNA_fasta_renamed/{sample}/{sample}_{i}.fa",
¦ sample=sample,
¦ i=glob_wildcards(os.path.join(checkpoint_output, "{i}.fa")).i)
print(x)
return x
#Aggregate fasta from split GTF files together
rule combine_fasta_file:
input:
¦ aggregate_input
output:
¦ "combined_fasta/{sample}.fa"
shell:
"cat {input} > {output}"
¦ aggregate_input
output:
¦ "combined_fasta/{sample}.fa"
shell:
¦ "cat {input} > {output}"
#Aggegate aggregated fasta files
def gather_files(wildcards):
files = expand("combined_fasta/{sample}.fa", sample=sample)
return(files)
rule aggregate_fasta_files:
input:
¦ gather_files
output:
¦ "aggregated_fasta/all_fastas_combined.fa"
shell:
¦ "cat {input} > {output}"
The issue I keep running into is that upon running this snakemake file the combine_fasta_file rule does not run. After spending more time with this error I realized that the issue was the aggregate_input function was not expanding, and returns an empty list [] instead of what I expect which is a list of all files in the in the directory expanded, ie: lncRNA_fasta_renamed/{sample}/{sample}_{i}.fa.
This is odd especially given the fact that checkpoint clustering does run correctly, and the downstream output files are in the rule all
Does anyone have any idea why this would be the case? Or thing of a reason this might be the case.
Command used to run snakemake: snakemake -rs Assemble_regions.snake --configfile snake_config_files/annotated_group_config.yaml
Just figured this out. The issue was my aggregate command targeting the wrong file. Previously I had it written as
def aggregate_input(wildcards):
checkpoint_output = checkpoints.clustering.get(**wildcards).output[0]
x = expand("lncRNA_fasta_renamed/{sample}/{sample}_{i}.fa",
¦ sample=sample,
¦ i=glob_wildcards(os.path.join(checkpoint_output, "{i}.fa")).i)
print(x)
return x
This issue however, is it was targeting the wrong files. Instead of globbig {i}.fa, it should be globbing the files produced from the checkpoint clustering. So changing this code to
def aggregate_input(wildcards):
checkpoint_output = checkpoints.clustering.get(**wildcards).output[0]
print(checkpoint_output)
x = expand("lncRNA_fasta_renamed/{sample}/{sample}_{i}.fa",
¦ sample=wildcards.sample,
¦ i=glob_wildcards(os.path.join(checkpoint_output, "{sample}_{i}.gtf")).i)
print(x)
return x
Solved the issue.
I am new to Snakemake and I am trying to develop some pipelines. I am encountering some problems when I use wildcards, trying to automate my bioinformatic analyses as much as possible. I run into troubles when the pipeline becomes more complex (as shown below). It looks like Snakemake does not resolve the wildcards correctly. During a dry run of the Snakefile, the wildcards values look correct in the executions of some rules. However, the same wildcards lead to an error in a different step(rule) of the pipeline, and I cannot figure out why. Below I provide the code and the output message of a dry run.
num=["327905-LR-41624_normal","327907-LR-41624_tumor"]
num_normal=["327905-LR-41624"]
num_tumor=["327907-LR-41624"]
path="/path/to/Snakemake/"
genome="/path/to/references_genome/Mus_musculus.GRCm38.dna_rm.toplevel.fa"
rule all:
input:
expand("/path/to/Snakemake/AS-{num_tum}_tumor_no_dupl_sort_RG_LB.bam",num_tum=num_tumor),
expand("/path/to/Snakemake/AS-{num_norm}_normal_no_dupl_sort_RG_LB.bam",num_norm=num_normal)
ruleorder: samtools_sort > remove_duplicates > samtools_index #> add_readgroup_tumor > add_readgroup_normal
rule trim_galore:
input:
r1="/path/to/Snakemake/AS-{num}_R1.fastq",
r2="/path/to/Snakemake/AS-{num}_R2.fastq"
output:
"/path/to/Snakemake/AS-{num }_R1_val_1.fq",
"/path/to/Snakemake/AS-{num }_R2_val_2.fq"
shell:
"module load trim-galore/0.5.0 ; module load pypy/2.7-6.0.0 ; trim_galore --output_dir /path/to/Snakemake/ --paired {input.r1} {input.r2} "
rule bwa_mem:
input:
R1="/path/to/Snakemake/AS-{num}_R1_val_1.fq",
R2="/path/to/Snakemake/AS-{num}_R2_val_2.fq"
output:
"/path/to/Snakemake/AS-{num}.bam"
shell:
"module load samtools/default ; module load bwa/0.7.8 ; bwa mem {genome} {input.R1} {input.R2} | samtools view -h -b > {output} "
rule samtools_sort:
input:
"/path/to/Snakemake/AS-{num}.bam"
output:
"/path/to/Snakemake/AS-{num}_sort.bam"
shell:
"module load samtools/default ; samtools sort -n -O BAM {input} > {output} "
rule remove_duplicates:
input:
"/path/to/Snakemake/AS-{num}_sort.bam"
output:
outbam="/path/to/Snakemake/AS-{num}_no_dupl_sort.bam",
metrics="/path/to/Snakemake/AS-{num}_dupl_metrics.txt"
shell:
"module load gatk/4.0.9.0 ; gatk MarkDuplicates -I {input} -O {output.outbam} -M {output.metrics} --REMOVE_DUPLICATES=true "
rule samtools_index:
input:
"/path/to/Snakemake/AS-{num}_no_dupl_sort.bam"
output:
"/path/to/Snakemake/AS-{num}_no_dupl_sort.bam.bai"
shell:
"module load samtools/default ; samtools index {input} "
rule add_readgroup_normal:
input:
"/path/to/Snakemake/AS-{num_normal}_normal_no_dupl_sort.bam"
output:
"/path/to/Snakemake/AS-{num_normal}_normal_no_dupl_sort_RG_LB.bam"
shell:
"module load gatk/4.0.9.0 ; gatk AddOrReplaceReadGroups -PL Illumina -LB { num_normal } -PU { num_normal } -SM NORMAL -I { input } -O {output} "
rule add_readgroup_tumor:
input:
"/path/to/Snakemake/AS-{num_tumor}_tumor_no_dupl_sort.bam"
output:
"/path/to/Snakemake/AS-{num_tumor}_tumor_no_dupl_sort_RG_LB.bam"
shell:
"module load gatk/4.0.9.0 ; gatk AddOrReplaceReadGroups -PL Illumina -LB { num_tumor } -PU { num_tumor } -SM TUMOR -I { input } -O {output} "
When I test the Snakefile with the command:
.local/bin/snakemake -s Snakefile_pipeline --dryrun
I get the following:
**Building DAG of jobs...**
**Job counts:**
**count jobs
1 add_readgroup_normal
1 add_readgroup_tumor
1 all
2 bwa_mem
2 remove_duplicates
2 samtools_sort
2 trim_galore
11**
**[Mon Apr 8 16:14:27 2019]
rule trim_galore:
input: /path/to/Snakemake/AS-327907-LR-41624_tumor_R1.fastq, /path/to/Snakemake/AS-327907-LR-41624_tumor_R2.fastq
output: /path/to/Snakemake/AS-327907-LR-41624_tumor_R1_val_1.fq, /path/to/Snakemake/AS-327907-LR-41624_tumor_R2_val_2.fq
jobid: 9
wildcards: num=327907-LR-41624_tumor**
**[Mon Apr 8 16:14:27 2019]
rule trim_galore:
input: /path/to/Snakemake/AS-327905-LR-41624_normal_R1.fastq, /path/to/Snakemake/AS-327905-LR-41624_normal_R2.fastq
output: /path/to/Snakemake/AS-327905-LR-41624_normal_R1_val_1.fq, /path/to/Snakemake/AS-327905-LR-41624_normal_R2_val_2.fq
jobid: 10
wildcards: num=327905-LR-41624_normal**
**[Mon Apr 8 16:14:27 2019]
rule bwa_mem:
input: /path/to/Snakemake/AS-327905-LR-41624_normal_R1_val_1.fq, /path/to/Snakemake/AS-327905-LR-41624_normal_R2_val_2.fq
output: /path/to/Snakemake/AS-327905-LR-41624_normal.bam
jobid: 8
wildcards: num=327905-LR-41624_normal**
**[Mon Apr 8 16:14:27 2019]
rule bwa_mem:
input: /path/to/Snakemake/AS-327907-LR-41624_tumor_R1_val_1.fq, /path/to/Snakemake/AS-327907-LR-41624_tumor_R2_val_2.fq
output: /path/to/Snakemake/AS-327907-LR-41624_tumor.bam
jobid: 7
wildcards: num=327907-LR-41624_tumor**
**[Mon Apr 8 16:14:27 2019]
rule samtools_sort:
input: /path/to/Snakemake/AS-327907-LR-41624_tumor.bam
output: /path/to/Snakemake/AS-327907-LR-41624_tumor_sort.bam
jobid: 5
wildcards: num=327907-LR-41624_tumor**
**[Mon Apr 8 16:14:27 2019]
rule samtools_sort:
input: /path/to/Snakemake/AS-327905-LR-41624_normal.bam
output: /path/to/Snakemake/AS-327905-LR-41624_normal_sort.bam
jobid: 6
wildcards: num=327905-LR-41624_normal**
**[Mon Apr 8 16:14:27 2019]
rule remove_duplicates:
input: /path/to/Snakemake/AS-327907-LR-41624_tumor_sort.bam
output: /path/to/Snakemake/AS-327907-LR-41624_tumor_no_dupl_sort.bam, /path/to/Snakemake/AS-327907-LR-41624_tumor_dupl_metrics.txt
jobid: 3
wildcards: num=327907-LR-41624_tumor**
**[Mon Apr 8 16:14:27 2019]
rule remove_duplicates:
input: /path/to/Snakemake/AS-327905-LR-41624_normal_sort.bam
output: /path/to/Snakemake/AS-327905-LR-41624_normal_no_dupl_sort.bam, /path/to/Snakemake/AS-327905-LR-41624_normal_dupl_metrics.txt
jobid: 4
wildcards: num=327905-LR-41624_normal**
**[Mon Apr 8 16:14:27 2019]
rule add_readgroup_normal:
input: /path/to/Snakemake/AS-327905-LR-41624_normal_no_dupl_sort.bam
output: /path/to/Snakemake/AS-327905-LR-41624_normal_no_dupl_sort_RG_LB.bam
jobid: 2
wildcards: num_normal=327905-LR-41624**
**RuleException in line 93 of /home/l136n/Snakefile_mapping_snv_call_pipeline2:
NameError: The name ' num_normal ' is unknown in this context. Please make sure that you defined that variable. Also note that braces not used for variable access have to be escaped by repeating them, i.e. {{print $1}}**
I have googled the error but found little help. Also, I double checked the pipeline for any incosistency. What I expect as output is indicated in the rule "all". The rules "add_readgroup_normal" and "add_readgroup_tumor" are supposed to take different subsets of input files, generated by the previous steps, which are run on all files. I wonder if the problem arises somehow because of this separation into 2 subsets.
I repeat that I am quite new to Snakemake, so I might be missing something silly somewhere! Any help would be really appreciated, as I am completely stuck!
Thank you so much in advance!
num=["327905-LR-41624_normal","327907-LR-41624_tumor"]
normal=["327905-LR-41624_normal"]
num_tumor=["327907-LR-41624_tumor"]
path="/path/to/Snakemake/"
genome="/icgc/dkfzlsdf/analysis/B210/references_genome/Mus_musculus.GRCm38.dna_rm.toplevel.fa"
rule all:
input:
"/path/to/Snakemake/AS-327905-LR-41624_normal_R1_val_1.fq",
"/path/to/Snakemake/AS-327905-LR-41624_normal_R2_val_2.fq",
"/path/to/Snakemake/AS-327907-LR-41624_tumor_R1_val_1.fq",
"/path/to/Snakemake/AS-327907-LR-41624_tumor_R2_val_2.fq",
"/path/to/Snakemake/AS-327905-LR-41624_normal_no_dupl_sort.bam.bai",
"/path/to/Snakemake/AS-327907-LR-41624_tumor_no_dupl_sort.bam.bai",
"/path/to/Snakemake/AS-327905-LR-41624_normal_RG.bam"
"/path/to/Snakemake/AS-327907-LR-41624_tumor_RG.bam"
rule trim_galore:
input:
r1="/path/to/Snakemake/AS-{num}_R1.fastq",
r2="/path/to/Snakemake/AS-{num}_R2.fastq"
output:
"/path/to/Snakemake/AS-{num }_R1_val_1.fq",
"/path/to/Snakemake/AS-{num }_R2_val_2.fq"
shell:
"module load trim-galore/0.5.0 ; module load pypy/2.7-6.0.0 ; trim_galore --output_dir /path/to/Snakemake/ --paired {input.r1} {input.r2} "
rule bwa_mem:
input:
R1="/path/to/Snakemake/AS-{num}_R1_val_1.fq",
R2="/path/to/Snakemake/AS-{num}_R2_val_2.fq"
output:
"/path/to/Snakemake/AS-{num}.bam"
shell:
"module load samtools/default ; module load bwa/0.7.8 ; bwa mem {genome} {input.R1} {input.R2} | samtools view -h -b > {output} "
rule samtools_sort:
input:
"/path/to/Snakemake/AS-{num}.bam"
output:
"/path/to/Snakemake/AS-{num}_sort.bam"
shell:
"module load samtools/default ; samtools sort -n -O BAM {input} > {output} "
rule remove_duplicates:
input:
"/path/to/Snakemake/AS-{num}_sort.bam"
output:
outbam="/path/to/Snakemake/AS-{num}_no_dupl_sort.bam",
metrics="/path/to/Snakemake/AS-{num}_dupl_metrics.txt"
shell:
"module load gatk/4.0.9.0 ; gatk MarkDuplicates -I {input} -O {output.outbam} -M {output.metrics} --REMOVE_DUPLICATES=true "
rule samtools_index:
input:
"/path/to/Snakemake/AS-{num}_no_dupl_sort.bam"
output:
"/path/to/Snakemake/AS-{num}_no_dupl_sort.bam.bai"
shell:
"module load samtools/default ; samtools index {input} "
rule add_readgroup_normal:
input:
"/path/to/Snakemake/AS-{normal}_no_dupl_sort.bam"
output:
"/path/to/Snakemake/AS-{normal}_RG.bam"
shell:
"module load gatk/4.0.9.0 ; gatk AddOrReplaceReadGroups -PL Illumina -LB { wildcards.normal } -PU { wildcards.normal } -SM NORMAL -I { input } -O {output} "
rule add_readgroup_tumor:
input:
"/path/to/Snakemake/AS-{num}_no_dupl_sort.bam"
output:
"/path/to/Snakemake/AS-{num_,'.*tumor.*'}_RG.bam"
shell:
"module load gatk/4.0.9.0 ; gatk AddOrReplaceReadGroups -PL Illumina -LB { wildcards.num } -PU { wildcards.num } -SM TUMOR -I { input } -O {output} "
Error:
Building DAG of jobs...
MissingInputException in line 37 of /home/l136n/Snakefile_mapping_snv_call_pipeline2b1:
Missing input files for rule trim_galore:
/path/to/Luca/Snakemake/AS-327905-LR-41624_normal_RG.bam/path/to/Luca/Snakemake/AS-327907-LR-41624_tumor_RG_R1.fastq
/path/to/Snakemake/AS-327905-LR-41624_normal_RG.bam/path/to/Luca/Snakemake/AS-327907-LR-41624_tumor_RG_R2.fastq
Wildcards are accessible in shell using syntax {wilcards.var}, not {var}. You have the latter in rule add_readgroup_normal.
Source.
I thought I would provide the solution, even if the post is a bit old now. The error was simply due to the presence of spaces inside "{ wildcards.var }".
I'm interviewing for a front-end developer job and have been given a coding test to build a simple front-end interface. I've been given the server, which has been written in Ruby (2.1.3) and has 3 endpoints which I am to make use of in my front-end client. I have no experience whatsoever with Ruby but I followed their instructions for setting up the server and it seems to work - I get responses from all the endpoints. The problem is that I'm not getting any response from my client app, which is in a different "domain" (actually, they're both just different ports of localhost). It seems that they are not setting the "Access-Control-Allow-Origin" headers on the API, but I don't want to go back to them asking how to fix this because I'm afraid it will reflect poorly on my test.
Below is the server file structure and I've also included the contents of a few files which seem to be relevant. If anyone wants to see other files, please just comment. I'm sure this is simple for anyone who knows Ruby but I haven't the foggiest clue.
D:.
¦ .gitkeep
¦ client.rb
¦ config.ru
¦ foo.sqlite3
¦ Gemfile
¦ Gemfile.lock
¦ Rakefile
¦ README.md
¦
+---app
¦ +---controllers
¦ ¦ api_controller.rb
¦ ¦ application_controller.rb
¦ ¦ not_found_controller.rb
¦ ¦ payments_controller.rb
¦ ¦
¦ +---models
¦ ¦ payment.rb
¦ ¦
¦ +---views
¦ booking.html
¦ confirmation.html
¦
+---config
¦ boot.rb
¦ dispatcher.rb
¦
+---db
¦ ¦ schema.rb
¦ ¦ seeds.rb
¦ ¦
¦ +---migrate
¦ 20150331094122_create_payments.rb
¦
+---lib
¦ my_application.rb
¦
+---log
¦ development.log
¦ test.log
¦
+---public
¦ ¦ 404.html
¦ ¦ 500.html
¦ ¦
¦ +---css
¦ style.css
¦
+---script
¦ console
¦ server
¦
+---spec
¦ ¦ spec_helper.rb
¦ ¦
¦ +---acceptance
¦ ¦ api_endpoint_spec.rb
¦ ¦ not_found_spec.rb
¦ ¦
¦ +---models
¦ payment_spec.rb
¦
+---vendor
+---libs
foobar_goodies
boot.rb
ENV['RACK_ENV'] ||= 'development'
# Bundler
require 'bundler/setup'
Bundler.require :default, ENV['RACK_ENV'].to_sym
require_relative '../lib/my_application.rb'
root_path = MyApplication.root
lib_path = File.join(MyApplication.root, 'lib')
app_path = File.join(MyApplication.root, 'app')
[root_path, lib_path, app_path].each { |path| $LOAD_PATH.unshift(path) }
ENV['PEERTRANSFER_ROOT'] = root_path
require 'config/dispatcher'
require 'sinatra/activerecord'
set :database, { adapter: "sqlite3", database: "foo.sqlite3" }
require 'app/models/payment'
my_application.rb
module MyApplication
class << self
def root
File.dirname(__FILE__) + '/..'
end
def views_path
root + '/app/views'
end
def public_folder
root + '/public'
end
end
end
dispatcher.rb
require 'controllers/application_controller'
require 'controllers/not_found_controller'
require 'controllers/api_controller'
require 'controllers/payments_controller'
module MyApplication
class Dispatcher
def call(env)
path_info = env['PATH_INFO']
app = case path_info
when %r{^/api} then ApiController.new
when %r{^/payment} then PaymentsController.new
else NotFoundController.new
end
app.call(env)
end
end
end
application_controller.rb
class ApplicationController < Sinatra::Base
set :views, MyApplication.views_path
set :public_folder, MyApplication.public_folder
not_found do
html_path = File.join(settings.public_folder, '404.html')
File.read(html_path)
end
error do
raise request.env['sinatra.error'] if self.class.test?
File.read(File.join(settings.public_folder, '500.html'))
end
end
api_endpoint_spec.rb
require 'spec_helper'
require 'models/payment'
describe 'API Endpoint' do
it 'responds with a JSON welcoming message' do
get '/api'
expect(last_response.status).to eq(200)
expect(last_response.body).to eq('{"message":"Hello Developer"}')
end
it 'returns all the stored payments' do
Payment.all.map(&:delete)
Payment.new(reference: 'any reference', amount: 10000).save
get '/api/bookings'
expect(last_response.status).to eq(200)
expect(last_response.body).to eq("{\"bookings\":[{\"reference\":\"any reference\",\"amount\":10000,\"country_from\":null,\"sender_full_name\":null,\"sender_address\":null,\"school\":null,\"currency_from\":null,\"student_id\":null,\"email\":null}]}")
end
def app
MyApplication::Dispatcher.new
end
end
Sinatra is a simple and lightweight web server. The general idea is that you write response routes like this:
get '/api' do
"Hello world"
end
When you make a HTTP GET request to yoursite.com/api you will get a "Hello world" as response.
Now to add the header you want, this should do the trick:
get '/api' do
response['Access-Control-Allow-Origin'] = '*'
"Hello world"
end