keep intermediate files defined by wildcards in makefile - makefile

I've defined a series of data-processing steps with a Makefile but find that the files belonging to the intermediate steps are deleted by Make. In the following example, the files processed_%.txt are always deleted.
#make some simple data
#echo "test data X" > test_x.txt
#echo "test data y" > test_y.txt
x = test_x.txt
y = test_y.txt
#these are deleted
processed_%.txt: ${x} ${y}
cat $< > $#
#these remain in the directory
processed_again_%.txt: processed_%.txt
cat $< > $#
all: processed_again_x.txt processed_again_y.txt
Can anyone explain what is happening and how to disable/control this behavior?
thanks,
zachcp

This is how chains of implicit rules work.
The second difference is that if make does create b in order to update something else, it deletes b later on after it is no longer needed. Therefore, an intermediate file which did not exist before make also does not exist after make. make reports the deletion to you by printing a ‘rm -f’ command showing which file it is deleting.
and you can control this behavior by marking the file as .SECONDARY
You can prevent automatic deletion of an intermediate file by marking it as a secondary file. To do this, list it as a prerequisite of the special target .SECONDARY. When a file is secondary, make will not create the file merely because it does not already exist, but make does not automatically delete the file. Marking a file as secondary also marks it as intermediate.

Related

GNU make: how to rebuild sentinel targets when a generated file is deleted?

A code generator is executed from GNU make. The generator produces several files (depending on the input), and only touches the files, when their content change. Therefore a sentinel target needs to be used to record the generator execution time:
GEN_READY : $(gen_input_files)
gen.exe $(gen_input_files)
touch GEN_READY
$(gen_output_files): GEN_READY
It works well, except when a generated file is deleted, but the sentinel file is left in place. Since the sentinel is there, and it's up-to-date, the generator is not executed again.
What is the proper solution to force make to re-run the generator in this case?
Here is one way to group them using an archive:
# create archive of output files from input files passed through gen.exe
GEN_READY.tar: $(gen_input_files)
#echo Generate the files
gen.exe $^
#echo Put generated files in archive
tar -c -f $# $(gen_output_files)
#echo Remove intermediate files (recreated by next recipe)
rm $(gen_output_files)
# Extracting individual files for use as prerequisite or restoration
$(gen_output_files): GEN_READY.tar
#echo Extract one member
tar -x -f $< $#
Since tar (and zip for that matter) allows duplicate entries there could be opportunities updating or appending files in archive instead of rewriting if input-output relation allows.
Edit: Simplified solution.

Makefile rule only works if file exists before make is invoked

Consider the following (MCVE of a) Makefile:
my_target: prepare test.bin
prepare:
echo >test.dat
%.bin: %.dat
cp $? $#
If you run make in a clean directory, it fails:
echo >test.dat
make: *** No rule to make target 'test.bin', needed by 'my_target'. Stop.
Run it again and it succeeds:
echo >test.dat
cp test.dat test.bin
What seems to happen is that the rule to make *.bin from *.dat only recognises that it knows how to make test.bin if test.dat exists before anything is executed, even though according to the output it has already created test.dat before it tries to create test.bin.
This is inconvenient for me as I have to prepare a few files first (import them from a different, earlier part of the build).
Is there a solution? Perhaps some way to allow the rules to be (re)evaluated in the light of the files which are now present?
There are a number of issues with your makefile. However based on your comments I'm inclined to assume that the MCVE here is just a little too "M" and it's been reduced so much that it has a number of basic problems. So I won't discuss them, unless you want me to.
The issue here is that you're creating important files without indicating to make that that's what you're doing. Make keeps internally a cache of the contents of directories that it's worked with, for performance reasons, and that cache is only updated when make invokes a rule that it understands will modify it.
Here your target is prepare but the recipe actually creates a completely different file, test.dat. So, make doesn't modify its internal cache of the directory contents and when it checks the cache to see if the file test.dat exists, it doesn't.
You need to be sure that your makefile is written such that it doesn't trick make: if a recipe creates a file foo then the target name should be foo, not bar.
This happens for wildcard targets, like %.bin. They get evaluated at the first pass. You could add an explicit target of test.bin. Or, follow the advice of tkausl and have test.dat depend on prepare (a phony target). In this case, you don't need the double dependency anymore:
my_target: test.bin
you have to write
test.dat: prepare
or (when when you want to stay with wildcards)
%.dat: prepare
#:
Usually, you might want to create and use .stamp files instead of a prepare target.

Target not known beforehand in the Makefile

I am trying to use makefile to manage my building process in a small project, where the target number and target names are not known beforehand but depends on the input. Specifically, I want to generate a bunch of data files (say .csv files) according to a cities_list.txt file with a list of city names inside. For example, if the contents of the txt file are:
newyork
washington
toronto
then a script called write_data.py would generate three files called newyork.csv, washington.csv and toronto.csv. When the content of the cities_list.txt file changes, I want make to deal with this change cleverly, i.e. only update the new-added cities files.
I was trying to define variable names in target names to make this happen but didn't succeed. I'm now trying to create a bunch of intermediate .name files as below:
all: *.csv
%.name: cities_list.txt
/bin/bash gen_city_files.sh $<
%.csv: %.name write_data.py
python3 write_data.py $<
clean:
rm *.name *.csv
This seems to be very close to success, but it only gives me one .csv file. The reason is obvious, because make can't determine what files should be generated for the all target. How can I let make know that this *.csv should contain all the files where there exists a corresponding *.name file? Or is there any better way to achieve what I wanted to do here?
All right, this should do it. We'd like a variable assignment at the head of the file:
CITY_FILES := newyork.csv washington.csv toronto.csv
There are two ways to do this. This way:
-include cities.mak
# this rule can come later in the makefile, near the bottom
cities.mak: cities_list.txt
#sed 's/^/CITIES := /' $< > $#
and this way:
CITIES := $(shell cat cities_list.txt)
After we've done one of those two, we can construct the list of needed files:
CITY_FILES := $(addsuffix .csv, $(CITIES))
and build them:
# It is convenient to have this be the first rule in the makefile.
all: $(CITY_FILES)
%.csv: write_data.py
python3 $< $*.name

Makefile where target names unknown

I'm trying to write a Makefile where multiple source files (in my case they are markdown) create multiple target files (pdfs). However, the target files generated have extra characters in the file name that can't be predicted (it happens to be a version number encoded in the source), but ideally the Makefile would not have to read the source itself.
So, for example:
file1.md => file1-v1.pdf
file2.md => file2-v2.pdf
...
I can calculate source name given a target name (by excluding anything after the hyphen and adding .md), but cannot calculate target name given the source.
Is it possible to write a Makefile that builds only the targets where the source have been updated?
This will be ugly, but it will work.
As it often is with Make, our problem divides into these two problems:
1. construct a list of targets
2. build them
Suppose we have five md files which map to pdf files (whose names we don't know beforehand):
file1.md => file1-v1.pdf
file2.md => file2-v1.pdf
file3.md => file3-v1.pdf
file4.md => file4-v1.pdf
file5.md => file5-v1.pdf
We can't use the real output file names as targets, because we don't know them beforehand, but we see five input files and know that we must build one output file for each. For now, a fake target name will do:
file1-dummy.pdf: file1.md
zap file1.md
When Make executes this rule, it produces the file file1-v1.pdf. The fact that it doesn't produce a file named file1-dummy.pdf is disquieting, but not a serious problem. We can turn this into a pattern rule:
%-dummy.pdf: %.md
zap $<
Then all we have to do is turn the list of existing input files (file1.md, file2.md, ...) into a list of dummy targets (file1-dummy.pdf, file2-dummy.pdf, ...), and build them. So far, so good.
But suppose some of the output files already exist. If file2-v2.pdf already exists -- and is newer than file2.md -- then we would prefer that Make not rebuild it (by attempting to build file2-dummy.pdf). In that case we would prefer that file2-v2.pdf be in the target list, with a rule that worked like this:
file2-v2.pdf: file2.md
zap $<
This is not easy to turn into a pattern rule, because Make does not handle wildcards very well, and cannot cope with multiple wildcards in a single phrase, not without a lot of clumsiness. But there is a way to write one rule that will cover both cases. First note that we can obtain the part of a variable before the hyphen with this kludge:
$(basename $(subst -,.,$(VAR)))
Armed with this, and with secondary expansion, we can write a pattern rule that will work with both cases, and construct a target list that will exploit it:
# There are other ways to construct these two lists, but this will do.
MD := $(wildcard *.md)
PDF := $(wildcard *.pdf)
PDFROOTS := $(basename $(subst -,.,$(basename $(PDF))))
MDROOTS := $(filter-out $(PDFROOTS), $(basename $(MD)))
TARGETS:= $(addsuffix -foo.pdf, $(MDROOTS)) $(PDF)
.SECONDEXPANSION:
%.pdf: $$(basename $$(subst -,., $$*)).md
# perform actions on $<
Make's algorithm always starts with the final output product and works its way backwards to the source files, to see what needs to be updated.
Therefore, you HAVE to be able to enumerate the final output product as a target name and correlate that back to the inputs that generate that output, for make to work.
This is also why make is not a great tool for building Java, for example, since the output filenames don't map easily to the input file names.
So, you must have at least one target/prerequisite pair which is derivable (for implicit rules), or state-able (for explicit rules)--that is, known at the time you write the makefile. If you don't then a marker file is your only alternative. Note you CAN add extra generated, non-derivative prerequisites (for example, in compilers you can add header files as prerequisites that are not related to the source file name), in addition to the known prerequisite.
#Beta's answer is informative and helpful, but I needed a solution (using GNU Make 4.1) that worked when the destination filename bears no resemblance to the input filename, for example, if it is generated from its content. I came up with the following, which takes every file matching *.in, and creates a file by reading the contents of the source file, appending a .txt, and using it as a filename to create. (For example, if test.in exists and contains foo, the makefile will create a foo.txt file.)
SRCS := $(wildcard *.in)
.PHONY: all
all: all_s
define TXT_template =
$(2).txt: $(1)
touch $$#
ALL += $(2).txt
endef
$(foreach src,$(SRCS),$(eval $(call TXT_template, $(src), $(shell cat $(src)))))
.SECONDARY_EXPANSION:
all_s: $(ALL)
The explanation:
The define block defines the recipe needed to make the text file from the .in file. It's a function that takes two parameters; $(1) is the .in. file and $(2) is the contents of it, or the base of the output filename. Replace touch with whatever makes the output. We have to use $$# because eval will expand everything once, but we want $# to left after this expansion. Since we have to collect all the generated targets so we known what all the make, the ALL line accumulates the targets into one variable. The foreach line goes through each source file, calls the function with the source filename and the contents of the file (i.e. what we want to be the name of the target, here you'd normally use whatever script generates the desired filename), and then evaluates the resulting block, dynamically adding the recipe to make. Thanks to Beta for explaining .SECONDARY_EXPANSION; I needed it for reasons not entirely clear to me, but it works (putting all: $(ALL) at the top doesn't work). The all: at the top depends on the secondary expansion of all_s: at the bottom and somehow this magic makes it work. Comments welcome.
maybe try this ? or something along those lines
# makefile
SRCS=$(wildcard *.md)
PDFS=$(shell printf *.pdf)
$(PDFS): $(SRCS)
command ...
the printf *.pdf is meant to either expand to the first of the pdf files if they exist, else fail if they don't and that will signal to make that it should build. if this doesn't work i suggest maybe experimenting with find, ls or other listing tools (e.g. compgen, complete), maybe even in combination with xargs to get everything on one line.

Make deletes my target. Why?

From the docs:
The second difference is that if make does create B in order to
update something else, it deletes B later on after it is no longer
needed. Therefore, an intermediate file which did not exist before
make also does not exist after make. make reports the deletion to
you by printing a rm -f command showing which file it is deleting.
Now, a makefile, like this:
$(shell rm -rf x D)
$(shell mkdir D)
$(shell touch D/x)
VPATH = D
all: x ;
x:: phony
#echo '$#'
.INTERMEDIATE: D/x
.PHONY: phony
Running:
$ make
D/x
rm D/x
$ ls D/x
ls: cannot access D/x: No such file or directory
Now, given the above quote, that Make removes only an "intermediate file which did not exist before", we have here a clear case, where:
The target did exist before running Make.
Make did not create the target.
Still, Make feels that it is appropriate to delete this file. A thing that it was not asked to do. So why does it?
Compare it to the following simple makefile:
$(shell rm -rf x)
$(shell touch x)
x ::
#echo '$#'
.INTERMEDIATE: x
Running:
$ make
x
$ ls x
x
Simple as that! Make did not remove a file that:
It did not create!
Existed, before running Make
Because, the all point of intermediate files removal is, to quote the documentation above:
Therefore, an intermediate file which did not exist before make also does not exist after make.
Nothing else!
So, how come in the first example, does Make go-around and deletes pre-existing files?
Description of Execution
In your first example whilst D/x did exist before executing make because you have specified phony as a prerequisite make must remake D/x as the target phony does not exist. So make created the intermediate file D/x and according to your quote it
deletes D/x when it is no longer needed.
after having remade all. The path of execution make takes is
all does not exist, so we must check x
make finds D/x but phony does not exist
make remakes phony and now we must remake D/x as it depends on phony which has been updated
make remakes the intermediate file D/x (which we must now delete when it is no longer needed)
make can now create all, and finally
make deletes the intermediate file D/x
Relation to Documentation
The key part of the documentation you have quoted is the first sentence.
The second difference is that if make does create B in order to update something else, it deletes B later on after it is no longer needed.
This is exactly what happens in your example. make creates D/x and so it must delete D/x later on when it is not needed.
Explanation of Documentation Interpretation
I think you need to be a little careful of your interpretation of the documentation. By stating 'Therefore' they are not saying make will not delete a file that already existed. They are simply saying as a consequence of make updating an intermediate file which did not previously exist it will delete it when it is finished. They are still leaving the possibility that an intermediate file existed before make was executed that it had to update and subsequently deletes.
I think I should stress this a bit more. Given the quote
Therefore, an intermediate file which did not exist before make also does not exist after make.
this does not mean make wont delete files that existed beforehand.

Resources