How to get a makefile to run all commands, regardless of targets or dependencies - makefile

I am writing a GNUmakefile to create a workflow to analyse some biological sequence data. The data comes in a format called fastq, which then undergoes a number of cleaning and analysis tools. I have attached what I currently have written, which takes me all the way from quality control before cleaning and then quality control afterwards. My problem is that I'm not sure how to get the 'fastqc' commands to run, as its targets are not dependencies for any of the other steps in the workflow.
%_sts_fastqc.html %_sts_fastqc.zip: %_sts.fastq
# perform quality control after cleaning reads
fastqc $^
%_sts.fastq: %_st.fastq
# trim reads based on quality
sickle se -f $^ -t illumina -o $#
%_st.fastq: %_s.fastq
# remove contaminated reads
tagdust -s adapters.fa $^
%_s.fastq: %.fastq
# trim adapters
scythe -a <adapters.fa> -o $# $^
%_fastqc.html %_fastqc.zip: %.fastq
# perform quality control before cleaning reads
fastqc $^
%.fastq: %.sra
# convert .fastq to .sra
fastq-dump $^

I believe adding these lines to the start of your Makefile will do what you are asking for:
SOURCES:=$(wildcard *.sra)
TARGETS:=$(SOURCES:.sra=_fastqc.html) $(SOURCES:.sra=_fastqc.zip)\
$(SOURCES:.sra=_sts_fastqc.html) $(SOURCES:.sra=_sts_fastqc.zip)
.PHONY: all
all: $(TARGETS)
What this does is grab all .sra files from the file system and build a list of targets to build by replacing the extension with whatever strings are necessary to produce the targets. (Note the the html and zip targets being produced by the same command I could have one or the other but I've decided to put both, in case the rules change and the hmtl and zip targets are ever produced separately.) Then it sets the phony all target to build all the computed targets. Here is a Makefile I've modified from yours by adding #echo everywhere which I used to check that things were okay without having to run the actual commands in your Makefile. You could copy and paste it in a file to first check that everything is fine before modifying your own Makefile with the lines above. Here it is:
SOURCES:=$(wildcard *.sra)
TARGETS:=$(SOURCES:.sra=_fastqc.html) $(SOURCES:.sra=_fastqc.zip)\
$(SOURCES:.sra=_sts_fastqc.html) $(SOURCES:.sra=_sts_fastqc.zip)
.PHONY: all
all: $(TARGETS)
%_sts_fastqc.html %_sts_fastqc.zip: %_sts.fastq
# perform quality control after cleaning reads
#echo fastqc $^
%_sts.fastq: %_st.fastq
# trim reads based on quality
#echo sickle se -f $^ -t illumina -o $#
%_st.fastq: %_s.fastq
# remove contaminated reads
#echo tagdust -s adapters.fa $^
%_s.fastq: %.fastq
# trim adapters
#echo 'scythe -a <adapters.fa> -o $# $^'
%_fastqc.html %_fastqc.zip: %.fastq
# perform quality control before cleaning reads
#echo fastqc $^
%.fastq: %.sra
# convert .fastq to .sra
#echo fastq-dump $^
I tested it here by running touch a.sra b.sra and then running make. It ran the commands for both files.

instead of using patterns, I would use a 'define':
# 'all' is not a file
.PHONY: all
# a list of 4 samples
SAMPLES=S1 S2 S3 S4
#define a macro named analyzefastq. It takes one argument $(1). we need to protect the '$' for later expension using $(eval)
define analyzefastq
# create a .st.fastq from fastq for file $(1)
$(1).st.fastq : $(1).fastq
tagdust -s adapters.fa $$^
# create a .fastq from seq for file $(1)
$(1).fastq : $(1).sra
fastq-dump $$^
endef
#all : final target dependency is all samples with a suffix '.st.fastq'
all: $(addsuffix ${S}.st.fastq, ${SAMPLES} )
## loop over each sample , name of variable is 'S' call and eval the previous macro, using 'S'=sample for the argument
$(foreach S,${SAMPLES},$(eval $(call analyzefastq,$(S))) )
I also use my tool jsvelocity https://github.com/lindenb/jsvelocity to generate large Makefile for NGS:
https://gist.github.com/lindenb/3c07ca722f793cc5dd60

Related

How can I build HTML with a Makefile with backlinks?

I am trying to statically build HTML files that requires a markdown file and a meta file called "whatlinkshere" for the HTML file to demonstrate its back links.
I believe it can be effeciently done by a Makefile, by first generating all the "whatlinkshere" files. I don't think this can be done in parallel, because the program that generates these files needs to append to the whatlinkshere files, and there could be race conditions that I am not quite sure how to solve.
Once the "whatlinkshere" files are generated then if a markdown file is edited, say foo.mdwn to point to bar.mdwn, only foo.mdwn needs to be analysed again for "whatlinkshere" changes. And finally only foo.html and bar.html need to be rebuilt.
I am struggling to accomplish this in my backlinks project.
INFILES = $(shell find . -name "*.mdwn")
OUTFILES = $(INFILES:.mdwn=.html)
LINKFILES = $(INFILES:.mdwn=.whatlinkshere)
all: $(OUTFILES)
# These need to be all made before the HTML is processed
$(LINKFILES): $(INFILES)
#echo Creating backlinks $#
#touch $#
#go run backlinks.go $<
%.html: %.mdwn %.whatlinkshere
#echo Deps $^
#cmark $^ > $#
Current problems here is that *.whatlinkshere** aren't being generated on first run. My workaround is for i in *.mdwn; do go run backlinks.go $i; done. Furthermore there are not rebuilding as I want after editing a file as described earlier. Something is horribly wrong. What am I missing?
I think I finally understood your problem. If I understood well:
You have a bunch of *.mdwn source files.
You generate *.whatlinkshere files from your *.mdwn source files using the backlinks.go utility. But this utility does not produce foo.whatlinkshere from foo.mdwn. It analyzes foo.mdwn, searches for links to other pages in it and, for each link to bar it finds, it appends a [foo](foo.html) reference to bar.whatlinkshere.
From each foo.mdwn source file you want to build a corresponding foo.html file with:
$ cmark foo.mdwn foo.whatlinkshere
Your rule:
$(LINKFILES): $(INFILES)
#echo Creating backlinks $#
#touch $#
#go run backlinks.go $<
contains one error and has several drawbacks. The error is the use of the $< automatic variable in the recipe. It expands as the first prerequisite, that is probably always pageA.mdwn in your case. Not what you want. $^ expands as all prerequisites but it is not the correct solution because:
your go utility takes only one source file name, but even if it was accepting several...
...make will run the recipe several times, one per link file, which is a waste, and...
...as your go utility appends to the link files it will even be worse than a waste: back links will be counted several times each, and...
...if make runs in parallel mode (note that you can prevent this with make -j1 or by adding the .NOTPARALLEL: special rule to your Makefile, but it is a pity) there is a risk of race conditions.
Important: the following works only with a flat organization where all source files and HTML files are in the same directory as the Makefile. Other organizations are possible, of course, but they would require some modifications.
First option using multi-targets pattern rules
One possibility is to use a special property of make pattern rules: when they have several targets make considers that one single execution of the recipe produces all targets. For instance:
pageA.w%e pageB.w%e pageC.w%e: pageA.mdwn pageB.mdwn pageC.mdwn
for m in $^; do go run backlinks.go $$m; done
tells make that pageA.whatlinkshere, pageB.whatlinkshere and pageC.whatlinkshere are all generated by one execution of:
for m in pageA.mdwn pageB.mdwn pageC.mdwn; do go run backlinks.go $m; done
(make expands $^ as all prerequisites and $$m as $m). Of course, we want to automate the computation of the pageA.w%e pageB.w%e pageC.w%e pattern targets list. This should make it:
INFILES := $(shell find . -name "*.mdwn")
OUTFILES := $(INFILES:.mdwn=.html)
LINKFILES := $(INFILES:.mdwn=.whatlinkshere)
LINKPATTERN := $(INFILES:.mdwn=.w%e)
.PHONY: all clean
.PRECIOUS: $(LINKFILES)
all: $(OUTFILES)
# These need to be all made before the HTML is processed
$(LINKPATTERN): $(INFILES)
#echo Creating backlinks
#rm -f $(LINKFILES)
#touch $(LINKFILES)
#for m in $^; do go run backlinks.go $$m; done
%.html: %.mdwn %.whatlinkshere
#echo Deps $^
#cmark $^ > $#
clean:
rm -f $(LINKFILES) $(OUTFILES)
Notes:
I declared all and clean as phony because... it is what they are.
I declared the whatlinkshere files as precious because (some of them) are considered by make as intermediates and without this declaration make would delete them after building the HTML files.
In the recipe for the whatlinkshere files I added rm -f $(LINKFILES) such that, if the recipe is executed, we restart from a clean state instead of concatenating new stuff to old (possibly outdated) references.
The pattern stem in the $(LINKPATTERN) can be anything but must match at least one character. I used w%e but whatlin%shere would work too. Use whatever is specific enough in your case. If you have a pageB.where file prefer whatlin%shere or what%here.
There is a drawback with this solution but it is due to your particular set-up: each time one single mdwn file changes it must be re-analyzed (which is normal) but any whatlinkshere file can be impacted. This is not predictable, it depends on the links that have been modified in this source file. But more problematic is the fact that the result of this analysis is appended to the impacted whatlinkshere files. They are not "edited" with the old content relative to this source file replaced by the new one. So, if you change just a comment in a source file, all its links will be appended again to the respective whatlinkshere files (while they are already there). This is probably not what you want.
This is why the solution above deletes all whatlinkshere files and re-analyzes all source files each time one single source file changes. And another negative consequence is that all HTML files must also be re-generated because all whatlinkshere files changed (even if their content did not really change, but make does not know this). If the analysis is super fast and you have a small number of mdwn files, it should be OK. Else it is sub-optimal but not easy to solve because of your particular set-up.
Second option using recursive make, separated back link files and marker files
There is a possibility, however, which consists in:
separating all back links references with one whatlinkshere file per from/to pair: foo.backlinks/bar.whatlinkshere contains all references to bar found in foo.mdwn,
using recursive make with one first invocation (when the STEP make variable is unset) to update all whatlinkshere files that need to be and a second invocation (STEP set to 2) to generate the HTML files that need to be,
using empty dummy files to mark that a foo.mdwn file has been analyzed: foo.backlinks/.done,
using the secondary expansion to be able to refer to the stem of a pattern rule in its list of prerequisites (and using $$ to escape the fist expansion).
But it is probably a bit more difficult to understand (and maintain).
INFILES := $(shell find . -name "*.mdwn")
OUTFILES := $(INFILES:.mdwn=.html)
DONEFILES := $(patsubst %.mdwn,%.backlinks/.done,$(INFILES))
.PHONY: all clean
ifeq ($(STEP),)
all $(OUTFILES): $(DONEFILES)
$(MAKE) STEP=2 $#
%.backlinks/.done: %.mdwn
rm -rf $(dir $#)
mkdir -p $(dir $#)
cp $< $(dir $#)
cd $(dir $#); go run ../backlinks.go $<; rm $<
touch $#
else
all: $(OUTFILES)
.SECONDEXPANSION:
%.html: %.mdwn $$(wildcard *.backlinks/$$*.whatlinkshere)
#echo Deps $^
#cmark $^ > $#
endif
clean:
rm -rf *.backlinks $(OUTFILES)
Even if it looks more complicated there are a few advantages with this version:
only outdated targets are rebuilt and only once each,
all whatlinkshere files are updated (if needed) before any HTML file is updated (if needed),
the whatlinkshere files can be built in parallel,
the HTML files can be built in parallel.
Third option using only recursive make and marker files
If you do not care about inaccurate results where back links persist in the results after they disappeared from the source files or where back links are uselessly replicated, we can reuse ideas from the previous solution but drop the separation in individual from/to whatlinkshere files.
INFILES := $(wildcard *.mdwn)
OUTFILES := $(patsubst %.mdwn,%.html,$(INFILES))
LINKFILES := $(patsubst %.mdwn,%.whatlinkshere,$(INFILES))
DONEFILES := $(patsubst %.mdwn,.%.done,$(INFILES))
.PHONY: all clean
.PRECIOUS: $(LINKFILES)
ifeq ($(STEP),)
.NOTPARALLEL:
all $(OUTFILES): $(DONEFILES)
$(MAKE) STEP=2 $#
.%.done: %.mdwn
go run backlinks.go $<
touch $#
else
all: $(OUTFILES)
%.html: %.mdwn %.whatlinkshere
#echo Deps $^
#cmark $^ > $#
%.whatlinkshere:
touch $#
endif
clean:
rm -f $(OUTFILES) $(LINKFILES) $(DONEFILES)
Notes:
As this works only for a flat organization I replaced the $(shell find...) by the make built-in $(wildcard ...).
I used patsubst instead of the old syntax but it's just a matter of taste.
The %.whatlinkshere: rule is a default rule to create the missing empty whatlinkshere files.
The NOTPARALLEL: special target prevents parallel execution when building the whatlinkshere files.

Makefile for dotfiles (graphviz)

As part of generating a PDF from a latex file, I got a makefile from tex.stackexchange.com.
# You want latexmk to *always* run, because make does not have all the info.
# Also, include non-file targets in .PHONY so they are run regardless of any
# file of the given name existing.
.PHONY: paper-1.pdf all clean
# The first rule in a Makefile is the one executed by default ("make"). It
# should always be the "all" rule, so that "make" and "make all" are identical.
all: paper-1.pdf
# CUSTOM BUILD RULES
# In case you didn't know, '$#' is a variable holding the name of the target,
# and '$<' is a variable holding the (first) dependency of a rule.
# "raw2tex" and "dat2tex" are just placeholders for whatever custom steps
# you might have.
%.tex: %.raw
./raw2tex $< > $#
%.tex: %.dat
./dat2tex $< > $#
# MAIN LATEXMK RULE
# -pdf tells latexmk to generate PDF directly (instead of DVI).
# -pdflatex="" tells latexmk to call a specific backend with specific options.
# -use-make tells latexmk to call make for generating missing files.
# -interaction=nonstopmode keeps the pdflatex backend from stopping at a
# missing file reference and interactively asking you for an alternative.
paper-1.pdf: paper-1.tex
latexmk -bibtex -pdf -pdflatex="pdflatex -interaction=nonstopmode" -use-make paper-1.tex
clean:
latexmk -bibtex -CA
My figures are .dot files that I turn into PNG files. I can make the PNGs with some basic shell commands, but that it doesn't make sense to use a shell script because you lose the advantages of make.
Here's what I've been trying after reading some documentation.
%.png: %.dot
dot -Tpng $(.SOURCE) -o $(.TARGET)
and
.dot.png:
dot -Tpng $(.SOURCE) -o $(.TARGET)
However, whenever I try to run the target directly the terminal prints is:
dot -Tpng -o
and it holds because it waits for input from STDIN because there was no input file.
If I try to invoke the rule by running make *.dot I get the output:
make: Nothing to be done for `figure-1a.dot'.
make: Nothing to be done for `figure-1b.dot'.
I'm clearly not understanding what I need to do. How do I get the makefile to take all the .dot files and create .png files every time I run through the creation of the PDF?
UPDATE: Here is another attempt I tried
graphs := $(wildcard *.dot)
.dot.png: $(graphs)
dot -Tpng $(.SOURCE) -o $(.TARGET).png
GNU make uses $< and $#, not .SOURCE and .TARGET, the recipe should be
.PHONY: all
all: $(patsubst %.dot,%.png,$(wildcard *.dot))
%.png: %.dot
dot -Tpng $< -o $#

GNU Make get the list of all files in a directory that were generated by previous rule

I am looking for Makefile macro to get list of all files in a directory that were generated as rule1 processing and using this list for rule2 processing.
Here's what I am trying to achieve :
Rule 1: Generate source .c files (using xml files) and place them in $(MYDIR) directory.
Rule 2: Get the list of all files in $(MYDIR) and create object files and place them in $(OBJDIR).
Problem is, I want to update list of files in Rule2 after Rule 1 has been processed, else list of files in $(MYDIR) will be empty.
all : rule_1 rule_2
rule1 : $(MYDIR)/generated_source1.c $(MYDIR)/generated_source2.c
$(MYDIR)/generated_source1.c:
xsltproc generator1.xml style_generator.xsl -o $(MYDIR)/generated_source_1.c
$(MYDIR)/generated_source2.c:
xsltproc generator2.xml style_generator.xsl -o $(MYDIR)generated_source_2.c
#Get list of all $(MYDIR).*c , create corresponding $(OBJDIR)/*.o list.
SOURCES := $(wildcard $(MYDIR)/*.c)
OBJECTS := $(notdir ${SOURCES})
GENERATED_OBJS := $(patsubst %.c,$(OBJDIR)/%.o,$(OBJECTS))
#This rule is compiling of all .c generated in rule1.
rule2 : $(GENERATED_OBJS)
ld -r -o $(OBJDIR)/generated_lib.o $(GENERATED_OBJS)
$(OBJDIR)/%.o: $(MYDIR)/%.c
gcc $(CFLAGS) -c -o $# $<
$(SOURCES) is shown empty, but actually it should contain generated_source1.c and generated_source2.c
I am not sure how .SECONDEXPANSION rule will work for my case.
You can't really (and don't really want to) play around with getting make to re-evaluate file existence during the running of the make process.
What you want to do is track the files from start to finish in make and then you have all your lists.
You can start at either direction but starting with the initial source tends to be easier.
So start with
MYDIR:=dir
OBJDIR:=obj
XML_SOURCES := $(wildcard $(MYDIR)/*.xml)
then translate from there to the generated source files
SOURCES := $(subst generator,generated_source,$(XML_SOURCES:.xml=.c))
and from there to the generated object files
GENERATED_OBJS := $(patsubst $(MYDIR)/%.c,$(OBJDIR)/%.o,$(SOURCES))
At which point you can define the default target
all: $(OBJDIR)/generated_lib.o
and then define the rules for each step
$(MYDIR)/%.c:
cat $^ > $#
$(OBJDIR)/%.o: $(MYDIR)/%.c
cat $^ > $#
$(OBJDIR)/generated_lib.o: $(GENERATED_OBJS)
ld -r -o $# $^
The $(MYDIR)/%.c rule needs a bit of extra magic to actually work correctly. You need to define the specific input/output pairs so that they are used correctly by that rule.
$(foreach xml,$(XML_SOURCES),$(eval $(subst generator,generated_source,$(xml:.xml=.c)): $(xml)))
This .xml to .c step would be easier if the input and output files shared a basename as you could then just use this and be done.
%.c: %.xml
cat $^ > $#

Iterate to next prerequisite file when matching when given variables for statement

I'm trying to do something like this with make:
SRC := $(src/*.md)
DIST := $(subst -,/,$(patsubst src/%.md, dist/%/index.html, $(SRC)))
all: $(DIST)
$(DIST): $(SRC)
mkdir -p $(#D) && pandoc $< -o $#
E.g., the prerequisite src/2014-04-myfile.md is put into target dist/2014/04/myfile/index.html with the transform pandoc
But when I use $< it only refers to the first argument in the $(SRC) variable.
I know normally we would do something like:
dist/%.html: src/%.md
but since I changed the file name in the output to just index.html for all files and used the original file name to create a new path I'm not sure how to go about iterating over the prerequisites.
Here's one way it could be done. The way this works is that it iterates over $(SRC) to create one rule per source file. The $$ in MAKE_DEP are necessary to prevent make from interpreting the functions when it first reads the contents of MAKE_DEP. The documentation on call and eval are also useful.
SRC := $(wildcard src/*.md)
# Set the default goal if no goal has been specified...
.DEFAULT_GOAL:=all
#
# This is a macro that we use to create the rules.
#
define MAKE_DEP
# _target is a temporary "internal" variable used to avoid recomputing
# the current target multiple times.
_target:=$$(subst -,/,$$(patsubst src/%.md, dist/%/index.html, $1))
# Add the current target to the list of targets.
TARGETS:=$$(TARGETS) $$(_target)
# Create the rule proper.
$$(_target):$1
mkdir -p $$(#D) && pandoc $$< -o $$#
endef # MAKE_DEP
# Iterate over $(SRC) to create each rule.
$(foreach x,$(SRC),$(eval $(call MAKE_DEP,$x)))
.PHONY: all
all: $(TARGETS)
If I create:
src/2000-01-bar.md
src/2014-04-foo.md
and run $ make -n, I get:
mkdir -p dist/2000/01/bar && pandoc src/2000-01-bar.md -o dist/2000/01/bar/index.html
mkdir -p dist/2014/04/foo && pandoc src/2014-04-foo.md -o dist/2014/04/foo/index.html
This could also be done using secondary expansion but it did not appear to me to be simpler or nicer.

Makefile is skipping certain dependencies

So I am writing a makefile that will take some files (*.in) as input to my C++ program and compare their output (results.out) to given correct output (*.out).
Specifically I have files t01.in, t02.in, t03.in, t04.in, and t05.in.
I have verified that $TESTIN = t01.in t02.in t03.in t04.in t05.in.
The problem is that it seems to run the %.in: %.out block only for three of these files, 1,3, and 4. Why is it doing this?
OUTPUT = chart
COMPILER = g++
SOURCES = chart.cpp
HEADERS =
OBJS = $(SOURCES:.cpp=.o)
TESTIN = tests/*.in
all: $(OUTPUT)
$(OUTPUT): $(OBJS)
$(COMPILER) *.o -o $(OUTPUT)
%.o: %.cpp
clear
$(COMPILER) -c $< -o $#
test: $(TESTIN)
%.in: %.out
./$(OUTPUT) < $# > tests/results.out
printf "\n"
ifeq ($(diff $< tests/results.out), )
printf "\tTest of "$#" succeeded for stdout.\n"
else
printf "\tTest of "$#" FAILED for stdout!\n"
endif
Additionally, if there is a better way of accomplishing what I am trying to do, or any other improvements I could make to this makefile (as I am rather new at this), suggestions would be greatly appreciated.
EDIT: If I add a second dependency to the block (%.in: %.out %.err), it runs the block for all five files. Still no idea why it works this way but not the way before.
First, I don't see how TESTIN can be correct. This line:
TESTIN = tests/*.in
is not a valid wildcard statement in Make; it should give the variable TESTIN the value tests/*.in. But let's suppose it has the value t01.in t02.in t03.in t04.in t05.in or tests/t01.in tests/t02.in tests/t03.in tests/t04.in tests/t05.in, or wherever these files actually are.
Second, as #OliCharlesworth points out, this rule:
%.in: %.out
...
is a rule for building *.in files, which is not what you intend. As for why it runs some tests and not others, here is my theory:
The timestamp of t01.out is later than that of t01.in, so Make decides that it must "rebuild" t01.in; likewise t03.in and t04.in. But the timestamp of t02.out is earlier than that of t02.in, so Make does not attempt to "rebuild" t02.in; likewise t05.in. The timestamps of t02.err and t05.err are later than those of t02.in and t05.in, respectively, so when you add the %.err prerequisite, Make runs all tests. You can test this theory by checking the timestamps and experimenting with touch.
Anyway, let's rewrite it. We need a new target for a new rule:
TESTS := $(patsubst %.in,test_%,$(TESTIN)) # test_t01 test_t02 ...
.PHONY: $(TESTS) # because there will be no files called test_t01, test_t02,...
$(TESTS): test_%: %.in %.out
./$(OUTPUT) < $< > tests/results.out
Now for the conditional. Your attempted conditional is in Make syntax; Make will evaluate it before executing any rule, so tests/result.out will not yet exist, and variables like $< will not yet be defined. We must put the conditional inside the command, in shell syntax:
$(TESTS): test_%: %.in %.out
./$(OUTPUT) < $< > tests/results.out
if diff $*.out tests/results.out >/dev/null; then \
echo Test of $* succeeded for stdout.; \
else echo Test of $* FAILED for stdout!; \
fi
(Note that only the first line of the conditional must begin with a TAB.)

Resources