Accessing Snakemake Config Samples - bioinformatics

I have a rule that needs to take 2 samples and combine them.
This is how my samples look like in my config file:
samples:
group1:
sra1:
sample: "SRR14724462"
cell_line: "NA24385"
exome_bedfile: "/bedfiles/truseq.sorted.bed"
sra2:
sample: "SRR14724472"
cell_line: "NA24385"
exome_bedfile: "/bedfiles/idt.sorted.bed"
group2:
sra1:
sample: "SRR14724463"
cell_line: "NA12878"
exome_bedfile: "/bedfiles/truseq.sorted.bed"
sra2:
sample: "SRR14724473"
cell_line: "NA12878"
exome_bedfile: "/bedfiles/idt.sorted.bed"
Essentially I want to combine group1 sra1 together, and group2 sra2 together, into these combinations:
SRR14724462 and SRR14724463
SRR14724472 and SRR14724473
This is my rule and rule all:
rule combine:
output:
r1 = TRIMMED_DIR + "/{sample1}_{sample2}_R1.fastq",
r2 = TRIMMED_DIR + "/{sample1}_{sample2}_R2.fastq"
params:
trimmed_dir = TRIMMED_DIR,
a = "{sample1}",
b = "{sample2}"
shell:
cd {params.trimmed_dir}
/combine.sh {params.a}_R1_trimmed.fastq {params.a}_R2_trimmed.fastq {params.b}_R1_trimmed.fastq {params.b}_R2_trimmed.fastq
rule all:
expand(TRIMMED_DIR + "/{sample1}_{sample2}_R1.fastq", sample1=list_a, sample2=list_b),
expand(TRIMMED_DIR + "/{sample1}_{sample2}_R2.fastq", sample1=list_a, sample2=list_b)
This works EXCEPT it does these combinations:
SRR14724462 and SRR14724463
SRR14724462 and SRR14724473
SRR14724472 and SRR14724463
SRR14724472 and SRR14724473
I only want these combinations:
SRR14724462 and SRR14724463
SRR14724472 and SRR14724473
Note: Not shown is how i got list_a and list_b, but essentially they are:
list_a = ['SRR14724462', 'SRR14724472']
list_b = ['SRR14724463', 'SRR14724473']

Related

WildcardError in Snakefile

I've been trying to run the following bioinformatic script:
configfile: "config.yaml"
WORK_TRIM = config["WORK_TRIM"]
WORK_KALL = config["WORK_KALL"]
rule all:
input:
expand(WORK_KALL + "quant_result_{condition}", condition=config["conditions"])
rule kallisto_quant:
input:
fq1 = WORK_TRIM + "{sample}_1_trim.fastq.gz",
fq2 = WORK_TRIM + "{sample}_2_trim.fastq.gz",
idx = WORK_KALL + "Homo_sapiens.GRCh38.cdna.all.fa.index"
output:
WORK_KALL + "quant_result_{condition}"
shell:
"kallisto quant -i {input.idx} -o {output} {input.fq1} {input.fq2}"
However, I keep obtaing an error like this:
WildcardError in line 13 of /home/user/directory/Snakefile:
Wildcards in input files cannot be determined from output files:
'sample'
Just to explain briefly, kallisto quant will produce 3 outputs: abundance.h5, abundance.tsv and run_injo.json. Each of those files need to be sent to their own newly created condition directory. I not getting exactly what is going on wrong. I'll appreciated any help on this.
If you think about it, you are not giving snakemake enough information.
Say "condition" is either "control" or "treated" with samples "C" and "T", respectively. You need to tell snakemake about the association control: C, treated: T. You could do this using functions-as-input files or lambda functions. For example:
cond2samp = {'control': 'C', 'treated': 'T'}
rule all:
input:
expand("quant_result_{condition}", condition=cond2samp.keys())
rule kallisto_quant:
input:
fq1 = lambda wc: "%s_1_trim.fastq.gz" % cond2samp[wc.condition],
fq2 = lambda wc: "%s_2_trim.fastq.gz" % cond2samp[wc.condition],
idx = "Homo_sapiens.GRCh38.cdna.all.fa.index"
output:
"quant_result_{condition}"
shell:
"kallisto quant -i {input.idx} -o {output} {input.fq1} {input.fq2}"

How to split the reports in a single dataset to Multiple Datasets uisng JCL

A dataset has many reports in it. I need the first report alone to another dataset. How can we achieve using JCL?
Below is the sample how the dataset looks like. My requirement is to sort out only the records under R0A report.
---Report - R0A---
List of Payments
Date : 23/07/2021
Name Payment-Amt Due-Date
AAAA 233.04 15/08/2021
BBBB 38.07 16/08/2021
---Report - R0B---
List of Payments
Date : 23/07/2021
Name Payment-Amt Due-Date
AAAA 233.04 15/08/2021
BBBB 38.07 16/08/2021
---Report - R0C---
List of Payments
Date : 23/07/2021
Name Payment-Amt Due-Date
AAAA 233.04 15/08/2021
BBBB 38.07 16/08/2021
If the size of the reports is fixed, you can use sort with the COPY and STOPAFT= options:
SORT FIELDS=COPY,STOPAFT=6
If you need a report beyond the first, you can add the SKIPREC= option. E.g. to get the third report, specify:
SORT FIELDS=COPY,SKIPREC=12,STOPAFT=6
If the reports differ in length, you could run a simple REXX.
/* REXX - NOTE This is only a skeleton. Error checking must be added. */
/* This code has not been tested, so thorough testing is due. */
"ALLOC F(INP) DS('your.fully.qualed.input.data.set.name') SHR"
"EXECIO * DISKR INP ( STEM InpRec. FINISH"
"FREE F(INP)"
TRUE = 1
FALSE = 0
ReportStartIndicator = "---Report"
ReportName = "- R0B---"
ReportHeader = ReportStartIndicator ReportName
ReportCopy = FALSE
do ii = 1 to InpRec.0 while ReportCopy = FALSE
if InpRec.ii = ReportHeader
then ReportCopy = TRUE
end
if ReportCopy
then do
OutRec.1 = InpRec.ii
Outcnt = 1
do jj = ii + 1 to InpRec.0 while ReportCopy = TRUE
if word( InpRec.jj, 1 ) = ReportStartIndicator /* Start of next report? */
then ReportCopy = FALSE
else do
OutCnt = OutCnt + 1
OutRec.Outcnt = InpRec.jj
end
end
"ALLOC F(OUT) DS('your.fully.qualed.output.data.set.name')" ,
"NEW CATLG SPACE(......) RECFM(....) LRECL(....)"
"EXECIO" OutCnt "DISKW OUT ( STEM OutRec. FINIS"
"FREE F(OUT)"
say "Done copying report." OutCnt "records have been copied."
end
else do
say "Report" ReportName "not found."
exit 16
end
As written in the comment in the REXX, I haven't tested this code. Also, error checking need to be added, especially for TSO HOST commands (ALLOC, EXECIO, FREE).
All of the solutions copy a single report to another data set. In the title, you wrote to multiple datasets. I'm sure you'll find solutions for this using above single report solutions.

Jmeter - How to compare two numbers using threshold

For the purpose of my testing i need to compare two numbers, which are real numbers.
a) 0.070103 vs. b) 0.0701029999999999986
What is the best way to archive that, if possible with threshold included?
How about rounding them?
Something like:
import java.math.MathContext
def a = 0.070103
def b = 0.0701029999999999986
def roundedA = a.round(new MathContext(5))
def roundedB = b.round(new MathContext(5))
log.info('Rounded a: ' + roundedA)
log.info('Rounded b: ' + roundedB)
log.info('Numbers are equal: ' + roundedA.equals(roundedB))
More information:
BigDecimal.round()
MathContext
Scripting JMeter Assertions in Groovy - A Tutorial

Snakemake rules

I want to use snakemake for making a bioinformatics pipeline and I googled it and read documents and other stuff, but I still don't know how to get it works.
Here are some of my raw data files.
Rawdata/010_0_bua_1.fq.gz, Rawdata/010_0_bua_2.fq.gz
Rawdata/11_15_ap_1.fq.gz, Rawdata/11_15_ap_2.fq.gz
...they are all paired files.)
Here is my align.snakemake
from os.path import join
STAR_INDEX = "/app/ref/ensembl/human/idx/"
SAMPLE_DIR = "Rawdata"
SAMPLES, = glob_wildcards(SAMPLE_DIR + "/{sample}_1.fq.gz")
R1 = '{sample}_1.fq.gz'
R2 = '{sample}_2.fq.gz'
rule alignment:
input:
r1 = join(SAMPLE_DIR, R1),
r2 = join(SAMPLE_DIR, R2),
params:
STAR_INDEX = STAR_INDEX
output:
"Align/{sample}.bam"
message:
"--- mapping STAR---"
shell:"""
mkdir -p Align/{wildcards.sample}
STAR --genomeDir {params.STAR_INDEX} --readFilesCommand zcat --readFilesIn {input.r1} {input.r2} --outFileNamePrefix Align/{wildcards.sample}/log
"""
This is it. I run this file by "snakemake -np -s align.snakemake" and I got this error.
WorkflowError:
Target rules may not contain wildcards. Please specify concrete files or a rule without wildcards.
I am sorry that I ask this question, there are many people using it pretty well though. any help would be really appriciated. Sorry for my English.
P.S. I read the official document and tutorial but still have no idea.
Oh I did. Here is my answer to my question for some people might want some help.
from os.path import join
STAR_INDEX = "/app/ref/ensembl/human/idx/"
SAMPLE_DIR = "Rawdata"
SAMPLES, = glob_wildcards(SAMPLE_DIR + "/{sample}_1.fq.gz")
R1 = '{sample}_1.fq.gz'
R2 = '{sample}_2.fq.gz'
rule all:
input:
expand("Align/{sample}/Aligned.toTranscriptome.out.bam", sample=SAMPLES)
rule alignment:
input:
r1 = join(SAMPLE_DIR, R1),
r2 = join(SAMPLE_DIR, R2)
params:
STAR_INDEX = STAR_INDEX
output:
"Align/{sample}/Aligned.toTranscriptome.out.bam"
threads:
8
message:
"--- Mapping STAR---"
shell:"""
mkdir -p Align/{wildcards.sample}
STAR --genomeDir {params.STAR_INDEX} --outSAMunmapped Within --outFilterType BySJout --outSAMattributes NH HI AS NM MD --outFilterMultimapNmax 20 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.04 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --sjdbScore 1 --runThreadN {threads} --genomeLoad NoSharedMemory --outSAMtype BAM Unsorted --quantMode TranscriptomeSAM --outSAMheaderHD \#HD VN:1.4 SO:unsorted --readFilesCommand zcat --readFilesIn {input.r1} {input.r2} --outFileNamePrefix Align/{wildcards.sample}/log
"""

Generate macros from names defined by list

I have this definitions in Makefile:
PREFIX = PRE
POSTFIXES = POST1 POST2 POST3
Now I would like to generate programmatically the following macros:
NAME_1 = PRE_POST1
NAME_2 = PRE_POST2
NAME_3 = PRE_POST3
#...
How to do that?
This does what you want assuming NAME_# was literal.
$(foreach f,$(POSTFIXES),$(eval NAME_$(subst POST,,$f) = $(PREFIX)_$f))
Result:
NAME_1 = PRE_POST1
NAME_2 = PRE_POST2
NAME_3 = PRE_POST3
Explanation:
Remove POST from each postfix leaving just the number: $(subst POST,,$f)
Concatenate NAME_ with the number from the previous step: NAME_$(subst POST,,$f)
Concatenate $(PREFIX) and the current postfix to create the desired value string: $(PREFIX)_$f
Use $(eval) to assign the value to the computed variable name: $(eval NAME_$(subst POST,,$f) = $(PREFIX)_$f)
Do that all for each postfix in the list: $(foreach f,$(POSTFIXES),$(eval NAME_$(subst POST,,$f) = $(PREFIX)_$f))
Update for sequential NAME_# variables unrelated to POSTFIXES values:
make doesn't do math, at all really, so you need to play games to "count". (Thanks to the fantastic GMSL for showing me this trick.)
POSTFIXES = POST_X POST_Y POST_Z
N := x
$(foreach f,$(POSTFIXES),$(eval NAME_$(words $N) = $(PREFIX)_$f)$(eval N += x))
Result:
NAME_1 = PRE_POST_X
NAME_2 = PRE_POST_Y
NAME_3 = PRE_POST_Z

Resources