Snakemake on cluster error: 'Wildcards' object has no attribute 'output' - cluster-computing

I'm running into an error of 'Wildcards' object has no attribute 'output', similar to this earlier question 'Wildcards' object has no attribute 'output', when I submit Snakemake to my cluster. I'm wondering if you have any suggestions for how to make this compatible with the cluster?
While my rule annotate_snps works when I test it locally, I get the following error on the cluster:
input: results/CI226380_S4/vars/CI226380_S4_bwa_H37Rv_gatk.vcf.gz
output: results/CI226380_S4/vars/CI226380_S4_bwa_H37Rv_gatk_rename.vcf.gz, results/CI226380_S4/vars/CI226380_S4_bwa_H37Rv_gatk_tmp.vcf.gz, results/CI226380_S4/vars/CI226380_S4_bwa_H37Rv_gatk_ann.vcf.gz
log: results/CI226380_S4/vars/CI226380_S4_bwa_H37Rv_annotate_snps.log
jobid: 1139
wildcards: samp=CI226380_S4, mapper=bwa, ref=H37Rv
WorkflowError in line 173 of /oak/stanford/scg/lab_jandr/walter/tb/mtb/workflow/Snakefile:
'Wildcards' object has no attribute 'output'
My rule is defined as:
rule annotate_snps:
input:
vcf='results/{samp}/vars/{samp}_{mapper}_{ref}_gatk.vcf.gz'
log:
'results/{samp}/vars/{samp}_{mapper}_{ref}_annotate_snps.log'
output:
rename_vcf=temp('results/{samp}/vars/{samp}_{mapper}_{ref}_gatk_rename.vcf.gz'),
tmp_vcf=temp('results/{samp}/vars/{samp}_{mapper}_{ref}_gatk_tmp.vcf.gz'),
ann_vcf='results/{samp}/vars/{samp}_{mapper}_{ref}_gatk_ann.vcf.gz'
params:
bed=config['bed_path'],
vcf_header=config['vcf_header']
shell:
'''
# Rename Chromosome to be consistent with snpEff/Ensembl genomes.
zcat {input.vcf}| sed 's/NC_000962.3/Chromosome/g' | bgzip > {output.rename_vcf}
tabix {output.rename_vcf}
# Run snpEff
java -jar -Xmx8g {config[snpeff]} eff {config[snpeff_db]} {output.rename_vcf} -dataDir {config[snpeff_datapath]} -noStats -no-downstream -no-upstream -canon > {output.tmp_vcf}
# Also use bed file to annotate vcf
bcftools annotate -a {params.bed} -h {params.vcf_header} -c CHROM,FROM,TO,FORMAT/PPE {output.tmp_vcf} > {output.ann_vcf}
'''
Thank you so much in advance!

The raw rule definition appears to be consistent except for the multiple calls to the contents of config, e.g. config[snpeff].
One thing to check is if the config definition on the single machine and on the cluster is the same, if it's not there might be some content that is confusing snakemake, e.g. if somehow config[snpeff] == "wildcards.output" (or something similar).

Related

'Wildcards' object has no attribute 'output'

I get an error for a rather simple rule. I have to write a task file for another program, expecting a tsv file. I read a certain number of parameters from my config file and write them to a file with a shell command.
Code:
rule create_tasks:
output:
temp("tasks_{sample}.tsv")
params:
ID="{sample}",
file=lambda wc: samples["path"][wc.sample] ,
bigwig=lambda wc: samples["bigwig"][wc.sample] ,
ambig=lambda wc: samples["ambig"][wc.sample]
shell:
'echo -e "{params.ID}\t{params.file}" > {output}'
When I execute the workflow, I get the following error:
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cluster nodes: 1
Job counts:
count jobs
1 create_tasks
1
[Mon Oct 12 14:48:15 2020]
rule create_tasks:
output: tasks_sampleA.tsv
jobid: 0
wildcards: sample=sampleA
echo -e "sampleA /Path/To/sampleA.bed " > tasks_sampleA.tsv
WorkflowError in line 23 of /path/to/workflow.snakefile:
'Wildcards' object has no attribute 'output'
File "/path/to/miniconda/envs/snakemake_submit/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 111, in run_jobs
File "/path/to/miniconda/envs/snakemake_submit/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 1233, in run
I should mention, that two of the variables are empty and that I expect the tabs/whitespaces in the echo command.
Does anybody have an explanation, why snakemake is trying to find output in the wildcards? I am expecially confused, because it is printing the correct command.
I've run into this same problem.
The issue is probably in how you invoked Snakemake from the command line.
For example, this was my Snakefile rule:
rule sort:
input:
"{file}.bam",
output:
"{file}.sorted.bam",
"{file}.sorted.bai",
shell:
"sambamba sort {input}"
I don't even have params or wildcards explicitly anywhere in there.
But when I run it on my Slurm HPC I get the same error:
snakemake -j 10 -c "sbatch {cluster.params}" -u cluster.yaml
The Wildcards (note the capital "W") and params objects weren't from the rule.
They came from the cluster execution of the rule, and the error was thrown when trying to parse the cluster.yaml file.
There was no cluster parameter specification in my cluster.yaml file for the sort rule, so the error was thrown.
I fixed this by adding
sort:
params: "..."
to my cluster.yaml file.
In your case, add cluster submission options under a create_tasks: ... list.
You can also add a __default__: ... list as the default submission parameters for any job, by default, unless it matches another rule.

problem with snakemake submitting jobs with multiple wildcard on SGE

I used snakemake on LSF cluster before and everything worked just fine. However, recently I migrated to SGE cluster and I am getting a very strange error when I try to run a job with more than one wildcard.
When I try to submit a job based on this rule
rule download_reads :
threads : 1
output : "data/{sp}/raw_reads/{accesion}_1.fastq.gz"
shell : "scripts/download_reads.sh {wildcards.sp} {wildcards.accesion} data/{wildcards.sp}/raw_reads/{wildcards.accesion}"
I get a following error (snakemake_clust.sh details bellow)
./snakemake_clust.sh data/Ecol1/raw_reads/SRA123456_1.fastq.gz
Building DAG of jobs...
Using shell: /bin/bash
Provided cluster nodes: 10
Job counts:
count jobs
1 download_reads
1
[Thu Jul 30 12:08:57 2020]
rule download_reads:
output: data/Ecol1/raw_reads/SRA123456_1.fastq.gz
jobid: 0
wildcards: sp=Ecol1, accesion=SRA123456
scripts/download_reads.sh Ecol1 SRA123456 data/Ecol1/raw_reads/SRA123456
Unable to run job: ERROR! two files are specified for the same host
ERROR! two files are specified for the same host
Exiting.
Error submitting jobscript (exit code 1):
Shutting down, this might take some time.
When I replace the sp wildcard with a constant, it works as expected:
rule download_reads :
threads : 1
output : "data/Ecol1/raw_reads/{accesion}_1.fastq.gz"
shell : "scripts/download_reads.sh Ecol1 {wildcards.accesion} data/Ecol1/raw_reads/{wildcards.accesion}"
I.e. I get
Submitted job 1 with external jobid 'Your job 50731 ("download_reads") has been submitted'.
I wonder why I might have this problem, I am sure I used exactly the same rule on the LSF-based cluster before without any problem.
some details
The snakemake submitting script looks like this
#!/usr/bin/env bash
mkdir -p logs
snakemake $# -p --jobs 10 --latency-wait 120 --cluster "qsub \
-N {rule} \
-pe smp64 \
{threads} \
-cwd \
-b y \
-o \"logs/{rule}.{wildcards}.out\" \
-e \"logs/{rule}.{wildcards}.err\""
-b y makes the command executed as it is, -cwd changes the working directory on the computing node the the working directory from where the job was submitted. Other flags / specifications are clear I hope.
Also, I am aware of --drmaa flag, but I think out cluster is not well configured for that. --cluster was till now a more robust solution.
-- edit 1 --
When I execute exactly the same snakefile locally (on the fronend, without the --cluster flag), the script gets executed as expected. It seems to be a problem of interaction of snakemake and the scheduler.
-o \"logs/{rule}.{wildcards}.out\" \
-e \"logs/{rule}.{wildcards}.err\""
This is a random guess... More than one wildcards are concatenated by space before replacing them into logs/{rule}.{wildcards}.err. So despite you use double quotes, SGE treats the resulting string as two files and throws the error. What if you use single quotes instead? Like:
-o 'logs/{rule}.{wildcards}.out' \
-e 'logs/{rule}.{wildcards}.err'
Alternatively, you could concatenate the wildcards in the rule and use the result on the command line. E.g.:
rule one:
params:
wc= lambda wc: '_'.join(wc)
output: ...
Then use:
-o 'logs/{rule}.{params.wc}.out' \
-e 'logs/{rule}.{params.wc}.err'
(This second solution, if it works, kind of sucks though)

chaincode argument error: unexpected end of JSON input

I have created a test network and I am able to install the chaincode I have created in golang. But when instantiating it I receive the following:
2020-03-24 08:00:00.843 UTC [chaincodeCmd] checkChaincodeCmdParams -> INFO 04a Using default escc
2020-03-24 08:00:00.844 UTC [chaincodeCmd] checkChaincodeCmdParams -> INFO 04b Using default vscc
Error: chaincode argument error: unexpected end of JSON input
If I build the code in its own directory, it compiles without problems.
I can install and instantiate the code in another development network, but not in one I have created from scratch.
Help would be appreciated!
Thanks!
Use quotation marks when referencing CC_CONSTRUCTOR variable. Otherwise, bash prioritizes inner spaces over inner quotation marks:
peer chaincode instantiate -C $CC_CHANNEL_ID -n $CC_NAME -v $CC_VERSION -c "$CC_CONSTRUCTOR" -o $ORDERER_ADDRESS
Thanks. I am setting env variables, then call the instantiate. Same variables are set for the install, which works fine.
export CC_CONSTRUCTOR='{ "Args" : [ "Message" , "Hello World - Init message" ] }'
export CC_NAME="testcc"
export CC_PATH="testcc"
export CC_VERSION="1.1"
export CC_CHANNEL_ID="testchannel"
peer chaincode instantiate -C $CC_CHANNEL_ID -n $CC_NAME -v $CC_VERSION -c $CC_CONSTRUCTOR -o $ORDERER_ADDRESS
I have tried escaping some that might need, that does not work. And again, the very same golang code and JSON constructor works on another test environment.
If I unset the CC_CONSTRUCTOR variable, I receive another error message, hence with high probability that is the problem.
In same cases, this error is generated by
const stateValue = await ctx.stub.getState(state);
when the state does not exist.
In other cases, it is because
evaluateTransaction is used instead of submitTransaction when reading states

snakemake running nanopolish and making it wait until previous rule is done

Hi I can run the different steps of nanopolish with snakemake. But when I run it it will give an error that the index file created in the bwa rule isnt available yet. After it gives this error it creates the file it that the error was about. If I run snakemake again without removing files it works because the file is there. How can I tell snake make to wait with the next step until the first one is done? I have googled on any ways to solve this problem and all I could find was priority and ruleorder and I have used those but it still doesnt work. Here is the script that I use.
ruleorder: bwa > nanopolish
rule bwa:
input:
"nanopolish/assembly.fasta"
output:
"nanopolish/draft.fa"
conda:
"envs/nanopolish.yaml"
priority:
50
shell:
"bwa index {input} - > {output}"
rule nanopolish:
input:
"nanopolish/assembly.fasta",
"zipped/zipped.gz"
output:
"nanopolish/reads.sorted.bam"
conda:
"envs/nanopolish.yaml"
shell:
"bwa mem -x ont2d {input} | samtools sort -o {output} -T reads.tmp"
You should take a look again at the docs to properly understand the idea of SnakeMake.
Rules describe how to create output files from input files
A rule is not executed until all its input exists, so all you have to do is add the output of the bwa rule
rule nanopolish:
input:
"nanopolish/assembly.fasta",
"nanopolish/draft.fa", # <-- output of bwa
"zipped/zipped.gz"
Ruleorder and priority are not relevant solutions for your problem.

How can I figure out how many threads cut needs in Snakemake rule?

I use cut in one rule of my pipeline and always throws an error, but without any error description.
When I try this command with a simple bash script it is working without any errors.
Here is the rule:
rule convert_bamheader:
input: bam/SERUM-ACT/exon_tagged_trimmed_mapped_cleaned.bam, stats/SERUM-ACT/good_barcodes_clean_filter.txt
output: bam/SERUM-ACT/exon_tagged_trimmed_mapped_cleaned_header.txt, bam/SERUM-ACT/exon_tagged_trimmed_mapped_cleaned_header_filtered.tsv
jobid: 15
wildcards: sample=SERUM-ACT
threads: 4
mkdir -p stats/SERUM-ACT
mkdir -p log/SERUM-ACT
samtools view bam/SERUM-ACT/exon_tagged_trimmed_mapped_cleaned.bam > bam/SERUM-ACT/exon_tagged_trimmed_mapped_cleaned_header.txt
cut -f 12,13,18,20-24 bam/SERUM-ACT/exon_tagged_trimmed_mapped_cleaned_header.txt | grep -f stats/SERUM-ACT/good_barcodes_clean_filter.txt > bam/SERUM-ACT/exon_tagged_trimmed_mapped_cleaned_header_filtered.tsv
Submitted DRMAA job 15 with external jobid 7027806.
Error in rule convert_bamheader:
jobid: 15
output: bam/SERUM-ACT/exon_tagged_trimmed_mapped_cleaned_header.txt, bam/SERUM-ACT/exon_tagged_trimmed_mapped_cleaned_header_filtered.tsv
ClusterJobException in line 256 of */pipeline.snake:
Error executing rule convert_bamheader on cluster (jobid: 15, external: 7027806, jobscript: */.snakemake/tmp.ewej7q4e/snakejob.convert_bamheader.15.sh). For detailed error see the cluster log.
Job failed, going on with independent jobs.
Exiting because a job execution failed. Look above for error message
Complete log: */.snakemake/log/2018-12-18T104741.092698.snakemake.log
I thought that it has to do something with the number of threads provided and number of threads needed for the cut step, but I am not sure.
Perhaps someone can help me?
Cheers!

Resources