I'm starting out with makefile and I am a bit puzzled about how patterns work. I have multiple different targets, each with a name-matching prerequisite. I would like to have a variable storing all the "stems" of the targets and prerequisities at the top, and then just adding the prefix/suffix and a common recipe for all of them. So far I have tried:
names = stem1 stem2 stem3
all: $(names:%=dir/prefix_%.txt) $(names:%=dir/another_%.txt)
$(names:%=dir/prefix_%.txt): $(names:%=sourcedir/yetanother_%.xlsx)
echo $#
echo prerequisite_with_the_same_stem_as_current_target
Even though this makes all the targets one by one, the prerequisities for each target are listed all, not just the one that matches with the current %(names) of the target. The reason I need it to match is because I then supply the current target and its single prerequisite to a script, which then makes the target. How to pattern-match each prerequisite with its one target?
The misconception that you have is about how make handles lists. If you have a variable:
names = stem1 stem2 stem3
then make handles this as a list but instantiates the whole list contents all at once every time you name this variable. It does not do a one-per-one operation on list contents, because that would be close to uncontrollable, depending on the situation. Instead it resorts to simple text replacement, thus your line
all: $(names:%=dir/prefix_%.txt) $(names:%=dir/another_%.txt)
is parsed&variable-replaced very simple into a string:
all: dir/prefix_stem1.txt dir/prefix_stem2.txt dir/prefix_stem3.txt ...etc...
The iterative list handling happens only within $(names:%=dir/prefix_%.txt) and so on, while the line itself, after variable-replacement, just is text which is fed to the second parsing step.
Along the same line your rule:
$(names:%=dir/prefix_%.txt): $(names:%=sourcedir/yetanother_%.xlsx)
expands to
dir/prefix_stem1.txt dir/prefix_stem2.txt dir/prefix_stem3.txt: sourcedir/yetanother_stem1.xlsx sourcedir/yetanother_stem2.xlsx sourcedir/yetanother_stem3.xlsx
which is a short-hand notation for the three rules:
dir/prefix_stem1.txt: sourcedir/yetanother_stem1.xlsx sourcedir/yetanother_stem2.xlsx sourcedir/yetanother_stem3.xlsx
dir/prefix_stem2.txt: sourcedir/yetanother_stem1.xlsx sourcedir/yetanother_stem2.xlsx sourcedir/yetanother_stem3.xlsx
dir/prefix_stem3.txt: sourcedir/yetanother_stem1.xlsx sourcedir/yetanother_stem2.xlsx sourcedir/yetanother_stem3.xlsx
and nothing else. Obviously you told make that each target depends on all of the prerequisites.
With a little tweaking and Static Pattern Rules you can achiev your goal, though:
MY_TARGETS := $(names:%=dir/prefix_%.txt) # create full target names
$(MY_TARGETS) : dir/prefix_%.txt : sourcedir/yetanother_%.xslx
Related
I'm writing a help output for a Bash script. Currently it looks like this:
dl [m|r]… (<file>|<URL> [m|r|<index>]…)…
The meaning that I'm trying to convey (and elsewhere describe with words) is that (after a potential "m" and/or "r") there can be an endless list of sets of arguments. The first argument in each set is always a file or URL and the further arguments can each be "m", "r" or a number. After that, it starts over with a file or URL and so on.
In my special case, I could just write this:
dl [m|r]… (<file>|<URL>) (<file>|<URL>|m|r|<index>)…
This works, because listing a URL and then another URL with nothing in between is allowed, as well as listing an arbitrarily long chain of "m"s (it's just useless to do so) and pretty much any other combination.
But what if that wasn't the case? What if I had for example a command like this:
change (<from> <to>)…
…which would be used e.g. like this:
change from1 to1 from2 to2 from3 to3
Would the bracket syntax be correct here? I just guessed it based on the grouping of (a|b), but I wasn't able to find any documentation that uses this for multiple, non-exclusive arguments that belong together. Is there even a standard for this?
I often find when adding rules to my workflow that I need to split large jobs up into batches. This means that my input/output files will branch out across temporary sets of batches for some rules before consolidating again into one input file for a later rule. For example:
rule all:
input:
expand("final_output/{sample}.counts",sample=config["samples"]) ##this final output relates to blast rule in that it will feature a column defining transcript type
...
rule batch_prep:
input: "transcriptome.fasta"
output:expand("blast_input_{X}.fasta",X=[1,2,3,4,5])
script:"scripts/split_transcriptome.sh"
rule blast:
input:"blast_input_{X}.fasta",
output:"output_blast.txt"
script:"scripts/blastx.sh"
...
rule rsem:
input:
"transcriptome.fasta",
"{sample}.fastq"
output:
"final_output/{sample}.counts"
script:
"scripts/rsem.sh"
In this simplified workflow, snakemake -n would show a separate rsem job for each sample (as expected, from wildcards set in rule all). However, blast would give a WildcardError stating that
Wildcards in input files cannot be determined from output files:
'X'
This makes sense, but I can't figure out a way for the Snakefile to submit separate jobs for each of the 5 batches above using the one blast template rule. I can't make separate rules for each batch, as the number of batches will vary on the size of the dataset. It seems it would be useful if I could define wildcards local to a rule. Does such a thing exist, or is there a better way to solve this issue?
I hope I understood your problem correctly, if not, feel free to correct me:
So, you want to call the rule blast for every "blast_input_{X}.fasta"?
Then, the batch wildcard would need to be carried over into the output.
rule blast:
input:"blast_input_{X}.fasta",
output:"output_blast_{X}.txt"
script:"scripts/blastx.sh"
If you then later want to merge the batches again in another rule, just use expand in the input of that rule.
input: expand("output_blast_{X}.txt", X=your_batches)
output: "merged_blast_output.txt"
I study genetic data from 288 fish samples (Fish_one, Fish_two ...)
I have four files per fish, each with a different suffix.
eg. for sample_name Fish_one:
file 1 = "Fish_one.1.fq.gz"
file 2 = "Fish_one.2.fq.gz"
file 3 = "Fish_one.rem.1.fq.gz"
file 4 = "Fish_one.rem.2.fq.gz"
I would like to apply the following concatenate instructions to all my samples, using maybe a text file containing a list of all the sample_name, that would be provided to a loop?
cp sample_name.1.fq.gz sample_name.fq.gz
cat sample_name.2.fq.gz >> sample_name.fq.gz
cat sample_name.rem.1.fq.gz >> sample_name.fq.gz
cat sample_name.rem.2.fq.gz >> sample_name.fq.gz
In the end, I would have only one file per sample, ideally in a different folder.
I would be very grateful to receive a bit of help on this one, even though I'm sure the answer is quite simple for a non-novice!
Many thanks,
Noé
I would like to apply the following concatenate instructions to all my
samples, using maybe a text file containing a list of all the
sample_name, that would be provided to a loop?
In the first place, the name of the cat command is mnemonic for "concatentate". It accepts multiple command-line arguments naming sources to concatenate together to the standard output, which is exactly what you want to do. It is poor form to use a cp and three cats where a single cat would do.
In the second place, although you certainly could use a file of name stems to drive the operation you describe, it's likely that you don't need to go to the trouble to create or maintain such a file. Globbing will probably do the job satisfactorily. As long as there aren't any name stems that need to be excluded, then, I'd probably go with something like this:
for f in *.rem.1.fq.gz; do
stem=${f%.rem.1.fq.gz}
cat "$stem".{1,2,rem.1,rem.2}.fq.gz > "${other_dir}/${stem}.fq.gz"
done
That recognizes the groups present in the current working directory by the members whose names end with .rem.1.fq.gz. It extracts the common name stem from that member's name, then concatenates the four members to the correspondingly-named output file in the directory identified by ${other_dir}. It relies on brace expansion to form the arguments to cat, so as to minimize code and (IMO) improve clarity.
I have hosts of two types: wirelessHostA[0..N], wirelessHostB[0..N]. I want to declare each of hosts wirelessHostA[0..N] to send messages to respective wirelessHostB[0..N]. Example: A[0] sends to B[0], A[10] sends to B[10]. Expression-wise I have got something like this:
*.wirelessHostA[0..${N}].app[ * ].destAddresses = "wirelessHostB[0..${N}]"
although this one is not correct. I am a bit unsure about how to declare a variable that can be iterated during a run and not a value per run.
You should not see the lines in the INI file as assignments where you can create procedural constructs like loops etc. Instead think about them as pattern matching rules. When a module needs a parameter, it scans the INI file from start, line by line and tries to match the first part (i.e. the part before =) to the current module path. If it matches, it assigns the second part to the parameter. If not, in continues with the next line in the INI file.
So first, write a pattern rule, then a value that can be evaluated in that context. When you specify the value, you may refer to other parameters (that are available in the module's context) or you may refer to other extra contextual information, such as the matching submodule's index (if it is part of a vector). There are other functions to access the index of parent of etc.
In this case, we have a submodule vector of hosts where each one contains a submodule vector of apps. The index operator would return the index of the current context module (which is the position in the app vector), but we need actually the index of the parent of the app vector (which is the host vector). There is a NED function for this too, called parentIndex(). So the solution would look like this:
*.wirelessHostA[*].app[*].destAddresses = "wirelessHostB[" + string(parentIndex()) + "]"
See https://doc.omnetpp.org/omnetpp/manual/#sec:ned-functions:category-ned for more info.
This is something that has puzzled me for some time and I have yet to find an answer.
I am in a situation where I am applying a standardized data cleaning process to (supposedly) similarly structured files, one file for each year. I have a statement such as the following:
replace field="Plant" if field=="Plant & Machinery"
Which was a result of the original code-writing based on the data file for year 1. Then I generalize the code to loop through the years of data. The problem becomes if in year 3, the analogous value in that variable was coded as "Plant and MachInery ", such that the code line above would not make the intended change due to the difference in the text string, but not result in an error alerting the change was not made.
What I am after is some sort of confirmation that >0 observations actually satisfied the condition each instance the code is executed in the loop, otherwise return an error. Any combination of trimming, removing spaces, and standardizing the text case are not workaround options. At the same time, I don't want to add a count if and then assert statement before every conditional replace as that becomes quite bulky.
Aside from going to the raw files to ensure the variable values are standardized, is there any way to do this validation "on the fly" as I have tried to describe? Maybe just write a custom program that combines a count if, assert and replace?
The idea has surfaced occasionally that replace should return the number of observations changed, but there are good reasons why not, notably that it is not a r-class or e-class command any way and it's quite important not to change the way it works because that could break innumerable programs and do-files.
So, I think the essence of any answer is that you have to set up your own monitoring process counting how many values have (or would be) changed.
One pattern is -- when working on a current variable:
gen was = .
foreach ... {
...
replace was = current
replace current = ...
qui count if was != current
<use the result>
}