Generate macros from names defined by list - makefile

I have this definitions in Makefile:
PREFIX = PRE
POSTFIXES = POST1 POST2 POST3
Now I would like to generate programmatically the following macros:
NAME_1 = PRE_POST1
NAME_2 = PRE_POST2
NAME_3 = PRE_POST3
#...
How to do that?

This does what you want assuming NAME_# was literal.
$(foreach f,$(POSTFIXES),$(eval NAME_$(subst POST,,$f) = $(PREFIX)_$f))
Result:
NAME_1 = PRE_POST1
NAME_2 = PRE_POST2
NAME_3 = PRE_POST3
Explanation:
Remove POST from each postfix leaving just the number: $(subst POST,,$f)
Concatenate NAME_ with the number from the previous step: NAME_$(subst POST,,$f)
Concatenate $(PREFIX) and the current postfix to create the desired value string: $(PREFIX)_$f
Use $(eval) to assign the value to the computed variable name: $(eval NAME_$(subst POST,,$f) = $(PREFIX)_$f)
Do that all for each postfix in the list: $(foreach f,$(POSTFIXES),$(eval NAME_$(subst POST,,$f) = $(PREFIX)_$f))
Update for sequential NAME_# variables unrelated to POSTFIXES values:
make doesn't do math, at all really, so you need to play games to "count". (Thanks to the fantastic GMSL for showing me this trick.)
POSTFIXES = POST_X POST_Y POST_Z
N := x
$(foreach f,$(POSTFIXES),$(eval NAME_$(words $N) = $(PREFIX)_$f)$(eval N += x))
Result:
NAME_1 = PRE_POST_X
NAME_2 = PRE_POST_Y
NAME_3 = PRE_POST_Z

Related

Accessing Snakemake Config Samples

I have a rule that needs to take 2 samples and combine them.
This is how my samples look like in my config file:
samples:
group1:
sra1:
sample: "SRR14724462"
cell_line: "NA24385"
exome_bedfile: "/bedfiles/truseq.sorted.bed"
sra2:
sample: "SRR14724472"
cell_line: "NA24385"
exome_bedfile: "/bedfiles/idt.sorted.bed"
group2:
sra1:
sample: "SRR14724463"
cell_line: "NA12878"
exome_bedfile: "/bedfiles/truseq.sorted.bed"
sra2:
sample: "SRR14724473"
cell_line: "NA12878"
exome_bedfile: "/bedfiles/idt.sorted.bed"
Essentially I want to combine group1 sra1 together, and group2 sra2 together, into these combinations:
SRR14724462 and SRR14724463
SRR14724472 and SRR14724473
This is my rule and rule all:
rule combine:
output:
r1 = TRIMMED_DIR + "/{sample1}_{sample2}_R1.fastq",
r2 = TRIMMED_DIR + "/{sample1}_{sample2}_R2.fastq"
params:
trimmed_dir = TRIMMED_DIR,
a = "{sample1}",
b = "{sample2}"
shell:
cd {params.trimmed_dir}
/combine.sh {params.a}_R1_trimmed.fastq {params.a}_R2_trimmed.fastq {params.b}_R1_trimmed.fastq {params.b}_R2_trimmed.fastq
rule all:
expand(TRIMMED_DIR + "/{sample1}_{sample2}_R1.fastq", sample1=list_a, sample2=list_b),
expand(TRIMMED_DIR + "/{sample1}_{sample2}_R2.fastq", sample1=list_a, sample2=list_b)
This works EXCEPT it does these combinations:
SRR14724462 and SRR14724463
SRR14724462 and SRR14724473
SRR14724472 and SRR14724463
SRR14724472 and SRR14724473
I only want these combinations:
SRR14724462 and SRR14724463
SRR14724472 and SRR14724473
Note: Not shown is how i got list_a and list_b, but essentially they are:
list_a = ['SRR14724462', 'SRR14724472']
list_b = ['SRR14724463', 'SRR14724473']

WildcardError in Snakefile

I've been trying to run the following bioinformatic script:
configfile: "config.yaml"
WORK_TRIM = config["WORK_TRIM"]
WORK_KALL = config["WORK_KALL"]
rule all:
input:
expand(WORK_KALL + "quant_result_{condition}", condition=config["conditions"])
rule kallisto_quant:
input:
fq1 = WORK_TRIM + "{sample}_1_trim.fastq.gz",
fq2 = WORK_TRIM + "{sample}_2_trim.fastq.gz",
idx = WORK_KALL + "Homo_sapiens.GRCh38.cdna.all.fa.index"
output:
WORK_KALL + "quant_result_{condition}"
shell:
"kallisto quant -i {input.idx} -o {output} {input.fq1} {input.fq2}"
However, I keep obtaing an error like this:
WildcardError in line 13 of /home/user/directory/Snakefile:
Wildcards in input files cannot be determined from output files:
'sample'
Just to explain briefly, kallisto quant will produce 3 outputs: abundance.h5, abundance.tsv and run_injo.json. Each of those files need to be sent to their own newly created condition directory. I not getting exactly what is going on wrong. I'll appreciated any help on this.
If you think about it, you are not giving snakemake enough information.
Say "condition" is either "control" or "treated" with samples "C" and "T", respectively. You need to tell snakemake about the association control: C, treated: T. You could do this using functions-as-input files or lambda functions. For example:
cond2samp = {'control': 'C', 'treated': 'T'}
rule all:
input:
expand("quant_result_{condition}", condition=cond2samp.keys())
rule kallisto_quant:
input:
fq1 = lambda wc: "%s_1_trim.fastq.gz" % cond2samp[wc.condition],
fq2 = lambda wc: "%s_2_trim.fastq.gz" % cond2samp[wc.condition],
idx = "Homo_sapiens.GRCh38.cdna.all.fa.index"
output:
"quant_result_{condition}"
shell:
"kallisto quant -i {input.idx} -o {output} {input.fq1} {input.fq2}"

What is the better logic for identifying the latest versioned files from a big list

I have 200 images in a folder and each file may contain several versions (example: car_image#2, car_image#2, bike_image#2, etc ). My requirement is to build a utility to copy all the latest files from this directory to another.
My approach is:
Put the imagesNames (without containing version numbers) into a list
Eliminate the duplicates from the list
Iterate through the list and identify the latest version of each unique file (I am little blurred on this step)
Can someone throw some better ideas/algorithm to achieve this?
My approach would be:
Make a list of unique names by getting each filename up to the #, only adding unique values.
Make a dictionary with filenames as keys, and set values to be the version number, updating when it's larger than the one stored.
Go through the dictionary and produce the filenames to grab.
My go-to would be a python script but you should be able to do this in pretty much whatever language you find suitable.
Ex code for getting the filename list:
#get the filename list
myList = []
for x in file_directory:
fname = x.split("#")[0]
if not fname in myList:
myList = myList + [fname]
myDict = {}
for x in myList:
if not x in myDict:
myDict[x] = 0
for x in file_directory:
fversion = x.split("#")[-1]
if myDict[x] < int(fversion):
myDict[x] = fversion
flist = []
for x in myDict:
fname = str(x) + "#" + str(myDict[x])
flist.append(fname)
Then flist would be a list of filenames of the most recent versions
I didn't run this or anything but hopefully it helps!
In Python 3
>>> images = sorted(set(sum([['%s_image#%i' % (nm, random.randint(1,9)) for i in range(random.randint(2,5))] for nm in 'car bike cat dog man tree'.split()], [])))
>>> print('\n'.join(images))
bike_image#2
bike_image#3
bike_image#4
bike_image#5
car_image#2
car_image#7
cat_image#3
dog_image#2
dog_image#5
dog_image#9
man_image#1
man_image#2
man_image#4
man_image#6
man_image#7
tree_image#3
tree_image#4
>>> from collections import defaultdict
>>> image2max = defaultdict(int)
>>> for image in images:
name, _, version = image.partition('#')
version = int(version)
if version > image2max[name]:
image2max[name] = version
>>> # Max version
>>> for image in sorted(image2max):
print('%s#%i' % (image, image2max[image]))
bike_image#5
car_image#7
cat_image#3
dog_image#9
man_image#7
tree_image#4
>>>

Reversing and Splitting in Python

I have a file "names.txt". The contents are
"Smith,RobJones,MikeJane,SallyPetel,Brian"
and I want to read "names.txt" and make a new file "names2.txt" that looks like:
"Rob Smith Mike Jones Sally Jane Brian Petel"
I know I should be using #rstrip(\n) and #.split(',')
So far I have:
namesfile = input('Enter name of file: ') #open names.txt
openfile = open(namesfile, 'r')
This will do exactly that. You might be able to polish this and make it more elegant and I encourage you to do so:
import re
with open('names.txt') as f:
# Split the names
names = re.sub(r'([A-Z])(?![A-Z])',r',\1',f.read()).split(',')
# Filter empty results
names = [n for n in names if n != '']
# Swap pairs with each other
for i in range(len(names)):
if((i+1)%2 == 0):
names[i], names[i-1] = names[i-1], names[i]
print ' '.join(names)

Could anyone explain me this makefile?

this is the makefile :
TOP=../..
DIRNAME=base_class/string
H = regexp.h regmagic.h string_version.h
CSRCS = regerror.c regsub.c EST_strcasecmp.c
TSRCS =
CPPSRCS = EST_String.cc EST_Regex.cc EST_Chunk.cc regexp.cc
LOCAL_DEFAULT_LIBRARY = eststring
SRCS = $(CPPSRCS) $(CSRCS)
OBJS = $(CPPSRCS:.cc=.o) $(CSRCS:.c=.o)
FILES = $(SRCS) $(TSRCS) $(H) Makefile
LOCAL_INCLUDES=-I.
ALL = .buildlibs
include $(TOP)/config/common_make_rules
now i know these part is variable
TOP=../..
DIRNAME=base_class/string
H = regexp.h regmagic.h string_version.h
CSRCS = regerror.c regsub.c EST_strcasecmp.c
TSRCS =
CPPSRCS = EST_String.cc EST_Regex.cc EST_Chunk.cc regexp.cc
LOCAL_DEFAULT_LIBRARY = eststring
SRCS = $(CPPSRCS) $(CSRCS)
what i do not know is :
OBJS = $(CPPSRCS:.cc=.o) $(CSRCS:.c=.o)
pls tell me the meaning of above statement , it is best if you figure out what above statement omit. thanks.
You can look this up in the GNU make manual. The above is equivalent to writing $(CPPSRCS:%.cc=%.o) (and ditto for CSRCS). In both of these, it goes through each word in the variable and if it matches the left-hand side of the equality, it's replaced with the right-hand side. So if a word matches the pattern %.cc (where % to make matches any sequence of characters), then it's replaced with %.o (where % is the same as in the original). The form you see is a special case where you can omit the % if it's the first thing in both sides.
So, given CPPSRCS = EST_String.cc EST_Regex.cc EST_Chunk.cc regexp.cc, then $(CPPSRCS:.cc=.o) expands to EST_String.o EST_Regex.o EST_Chunk.o regexp.o.

Resources