why does "make" delete target files only if implicit - makefile

Suppose I have a Makefile like this
B1.txt: A1.txt
python big_long_program.py A1.txt > $#
correct1.txt: B1.txt reference.txt
diff -q B1.txt reference.txt
touch $#
Then the output when I make correct1.txt is pretty well what I would expect:
python big_long_program.py A1.txt > B1.txt
diff -q B1.txt reference.txt
touch correct1.txt
Now if I have lots of files, B1.txt, B2.txt, B3.txt etc, so create an implicit rule:
B%.txt: A%.txt
python big_long_program.py A$*.txt > $#
correct%.txt: B%.txt reference.txt
diff -q B$*.txt reference.txt
touch $#
Instead this happens when I make correct1.txt:
python big_long_program.py A1.txt > B1.txt
diff -q B1.txt reference.txt
touch correct1.txt
rm B1.txt
i.e. there difference is that now the file B1.txt has been deleted, which in many cases is really bad.
So why are implicit rules different? Or am I doing something wrong?

You are not doing anything wrong. The behavior you observe and analyze is documented in 10.4 Chains of Implicit Rules. It states that intermediate files are indeed treated differently.
The second difference is that if make does create b in order to update
something else, it deletes b later on after it is no longer needed.
Therefore, an intermediate file which did not exist before make also
does not exist after make. make reports the deletion to you by
printing a rm -f command showing which file it is deleting.
The documentation does not explicitly explain why it behaves like this. Looking in the file ChangeLog.1, there is a reference to the remove_intermediates function as far back as 1988. At that time, disk space was expensive and at a premium.
If you do not want this behavior, mention the targets you want to keep somewhere in the makefile as an explicit prerequisite or target or use the .PRECIOUS or the .SECONDARY special built-in targets for that.
With thanks to MadScientist for the additional comments, see below.

Related

Makefile with multiple output rule not rebuilding nested dependency when run in parallel

I have a makefile with a rule that produces multiple outputs. To work around the issue of this rule often being run multiple times when run in parallel, I've used a dummy "timestamp file". I also have a rule that depends on the one of the outputs of this "multi-output" rule.
When all this is run from a clean state, it all works fine. However, if the source of the multi-output rule is updated, the other rule is not run, until Make is run again.
I've looked at the debug output, but haven't been able to make much headway. It almost seems like Make might be caching the old timestamp of the previous version of the multi-output file?
Hopefully the below demonstrates the problem adequately.
$ cat Makefile
all: data.txt
multioutput.stamp: sourcefile.txt
touch multioutput1.txt
touch multioutput2.txt
touch $#
FILES=multioutput1.txt multioutput2.txt
$(FILES): multioutput.stamp
data.txt: multioutput1.txt
touch data.txt
$ touch sourcefile.txt
$ make
touch multioutput1.txt
touch multioutput2.txt
touch multioutput.stamp
touch data.txt
$ touch sourcefile.txt # update
$ make # data.txt is not updated!!
touch multioutput1.txt
touch multioutput2.txt
touch multioutput.stamp
$ make # except when it's run again??
touch data.txt
What am I doing wrong here, and what should I be doing instead?
You are lying to make. Don't do that.
Once you have run the recipe of a rule, make checks to see if a file has actually been updated by the recipe. If it has not changed, you don't have to re-make any target that lists the file as a dependency.
Here you have given no recipe for multioutput1.txt, just a dependency line:
multioutput1.txt: multioutput.stamp
Make knows there is no way to update multioutput1.txt.
Cheap fix
Force make to check the dependency by supplying an explicit recipe for multioutput1.txt.
Even an empty one will do:
${FILES}: multioutput.stamp ;
Yep, that's what the ; signifies — the first line of the recipe follows on the same line.
Better fix
The only way of saying to make "this recipe creates two files" is with a pattern rule.
Then there is no need for a multioutput.stamp.
.PHONY: all
all: data.txt
%1.txt %2.txt:
touch $*1.txt
touch $*2.txt
data.txt: multioutput1.txt multioutput2.txt
touch data.txt
Here $* in the recipe expands to whatever the % matched in the dependency line.
Why have I made data.txt depend on both multioutput files?
Here I took the view that if either of multioutput1 or multioutput2 were missing, we should probably run the recipe to create both.
YMMV.
Best Fix
YMMV but I don't like pattern rules.
They are too arbitrary for my tastes.
We observe that one of multioutput1.txt and multioutput2.txt will always be younger than the other.
They will never have the same timestamp assuming a modern filesystem.
.PHONY: all
all: data.txt
multioutput2.txt: start.stamp
touch $#
touch multioutput2.txt
multioutput1.txt: multioutput2.txt ;
data.txt: multioutput1.txt
touch data.txt

Make : Get dependencies of a target inside another one

I often find myself trying to reference the dependency of a target (Target1) inside another one (Target2).
Consider the following Makefile. How nice would be a $(deps target-name) function !
Rule1 : file1 file2 file3
echo $^ # Careful, these are whitespaces, not a tab !
Rule2 : file4 file5 file6
echo $^ # Careful, these are whitespaces, not a tab !
clean-Rule-1-2 :
rm $(deps Rule1) $(deps Rule2) # Careful, these are whitespaces, not a tab !
I found this link mentioning that one could build himself his own dependency list, but it's looks rather tedious.
Does any one have a better solution (assuming none are natively implemented in the Makefile) and/or workflow tips referring to this issue ?
Why do you want to list prerequisites of a clean-type rule? That just forces make to build those dependencies if they're out of date, only to delete them again.
There is no way to do what you want because it is not possible to be consistent about it. For example, your rules could be written like this:
Rule1 : file1 file2 file3
echo $^ # Careful, these are whitespaces, not a tab !
clean-Rule-1-2 : $(deps Rule1)
rm $^ # Careful, these are whitespaces, not a tab !
Rule1 : file10 file11 file12
Oops! When $(deps ...) is expanded make doesn't know about the extra three prerequisites and so they won't be listed. And that's not even considering implicit rules, where make doesn't know what the full prerequisite list is when it parses the makefile; it only computes them when it's trying to build the target.
You don't give a real example of what you want to do, but generally the way makefiles are written is that you put the prerequisites into variables, then you can use the variables in multiple places.

Compress Makefile Intermediates (Two ways to create same target)

I have a Makefile I'm currently using for purposes other than compiling. Instead of deleting intermediate files, I'd like to keep them, but gzip them, and then later have Makefile detect that an intermediate file exists and instead of recomputing it, simply unzip it.
Let's suppose I have target target.txt that depends on an intermediate file called intermediate.txt, which itself depends on prereq.txt. So something like:
target.txt: intermediate.txt
intermediate.txt: prereq.txt
Now by default Make deletes the intermediate file, but we can disable that. But let's say that both computing intermediate.txt takes a long time, so I'll disable automatic deletion of it. But what if file intermediate.txt is also very large, so I'd like to compress it (gzip) to intermediate.txt.gz. Instead of recomputing the file, I'd like Make to unzip the existing zipped file, so gunzip intermediate.txt.gz.
The larger question I suppose I'm asking is I have two ways of making a target, based on two different dependencies. I'd like Make to execute the rule that has the prerequisite that exists, and ignore the other rule, but perhaps delete the zipped version and recompute it only if the prerequisite to the intermediate has a newer timestamp. Does anyone have any suggestions?
If you are using GNU Make, you can do this with pattern rules (tac is used to represent whatever processing you're doing):
%.txt: %.i.txt
tac $^ > $# #make .txt file the normal way
gzip $^ #gzip the intermediate file
%.txt: %.i.txt.gz
gunzip < $^ | tac > $# #make .txt by streaming in the gzipped intermediate
%.i.txt: %.p.txt
tac $^ > $# #make the intermediate file from the prereq
This works for pattern rules because if the .i.txt file is not found, Make falls through to the next pattern and looks for the .i.txt.gz version. This does not work for explicit rules, because later rules simply replace earlier rules.
I would guess that you do NOT want to just uncompress the .gz version if prereq.txt is newer than the gzipped file. In that case, I would tend to just use shell tests to store off and restore the gzipped file and not get make directly involved:
target.txt: intermediate.txt
...same as it ever was ....
intermediate.txt: prereq.txt
if [ $#.gz -nt $< ]; then \
gunzip <$#.gz >$#; \
else \
whatever >$# <$<; \
gzip <$# >$#.gz; \
fi
where 'whatever' is the command that creates intermediate.txt from prereq.txt

telling 'make' to ignore dependencies when the top target has been created

I'm running the following kind of pipeline:
digestA: hugefileB hugefileC
cat $^ > $#
rm $^
hugefileB:
touch $#
hugefileC:
touch $#
The targets hugefileB and hugefileC are very big and take a long time to compute (and need the power of Make). But once digestA has been created, there is no need to keep its dependencies: it deletes those dependencies to free up disk space.
Now, if I invoke 'make' again, hugefileB and hugefileC will be rebuilt, whereas digestA is already ok.
Is there any way to tell 'make' to avoid to re-comile the dependencies ?
NOTE: I don't want to build the two dependencies inside the rules for 'digestA'.
Use "intermediate files" feature of GNU Make:
Intermediate files are remade using their rules just like all other files. But intermediate files are treated differently in two ways.
The first difference is what happens if the intermediate file does not exist. If an ordinary file b does not exist, and make considers a target that depends on b, it invariably creates b and then updates the target from b. But if b is an intermediate file, then make can leave well enough alone. It won't bother updating b, or the ultimate target, unless some prerequisite of b is newer than that target or there is some other reason to update that target.
The second difference is that if make does create b in order to update something else, it deletes b later on after it is no longer needed. Therefore, an intermediate file which did not exist before make also does not exist after make. make reports the deletion to you by printing a rm -f command showing which file it is deleting.
Ordinarily, a file cannot be intermediate if it is mentioned in the makefile as a target or prerequisite. However, you can explicitly mark a file as intermediate by listing it as a prerequisite of the special target .INTERMEDIATE. This takes effect even if the file is mentioned explicitly in some other way.
You can prevent automatic deletion of an intermediate file by marking it as a secondary file. To do this, list it as a prerequisite of the special target .SECONDARY. When a file is secondary, make will not create the file merely because it does not already exist, but make does not automatically delete the file. Marking a file as secondary also marks it as intermediate.
So, adding the following line to the Makefile should be enough:
.INTERMEDIATE : hugefileB hugefileC
Invoking make for the first time:
$ make
touch hugefileB
touch hugefileC
cat hugefileB hugefileC > digestA
rm hugefileB hugefileC
And the next time:
$ make
make: `digestA' is up to date.
If you mark hugefileB and hugefileC as intermediate files, you will get the behavior you want:
digestA: hugefileB hugefileC
cat $^ > $#
hugefileB:
touch $#
hugefileC:
touch $#
.INTERMEDIATE: hugefileB hugefileC
For example:
$ gmake
touch hugefileB
touch hugefileC
cat hugefileB hugefileC > digestA
rm hugefileB hugefileC
$ gmake
gmake: `digestA' is up to date.
$ rm -f digestA
$ gmake
touch hugefileB
touch hugefileC
cat hugefileB hugefileC > digestA
rm hugefileB hugefileC
Note that you do not need the explicit rm $^ command anymore -- gmake automatically deletes intermediate files at the end of the build.
I would recommend you to create pseudo-cache files that are created by the hugefileB and hugeFileC targets.
Then have digestA depend on those cache files, because you know they will not change again until you manually invoke the expensive targets.
See also .PRECIOUS:
.PRECIOUS : hugefileA hugefileB
.PRECIOUS
The targets which .PRECIOUS depends on are given the following special treatment: if make is killed or interrupted during the execution of their recipes, the target is not deleted. See Interrupting or Killing make. Also, if the target is an intermediate file, it will not be deleted after it is no longer needed, as is normally done. See Chains of Implicit Rules. In this latter respect it overlaps with the .SECONDARY special target.
You can also list the target pattern of an implicit rule (such as ‘%.o’) as a prerequisite file of the special target .PRECIOUS to preserve intermediate files created by rules whose target patterns match that file’s name.
Edit: On re-reading the question, I see that you don't want to keep the hugefiles; maybe do this:
digestA : hugefileA hugefileB
grep '^Subject:' %^ > $#
for n in $^; do echo > $$n; done
sleep 1; touch $#
It truncates the hugefiles after using them, then touches the output file a second later, just to ensure that the output is newer than the input and this rule won't run again until the empty hugefiles are removed.
Unfortunately, if only the digest is removed, then running this rule will create an empty digest. You'd probably want to add code to block that.
The correct way is to not delete the files, as that removes the information that make uses to determine whether to rebuild the files.
Recreating them as empty does not help because make will then assume that the empty files are fully built.
If there is a way to merge digests, then you could create one from each of the huge files, which is then kept, and the huge file automatically removed as it is an intermediate.

Makefile rule depending on change of number/titles of files instead of change in content of files

I'm using a makefile to automate some document generation. I have several documents in a directory, and one of my makefile rules will generate an index page of those files. The list of files itself is loaded on the fly using list := $(shell ls documents/*.txt) so I don't have to bother manually editing the makefile every time I add, delete, or rename a document. Naturally, I want the index-generation rule to trigger when number/title of files in the documents directory changes, but I don't know how to set up the prerequisites to work in this way.
I could use .PHONY or something similar to force the index-generation to run all the time, but I'd rather not waste the cycles. I tried piping ls to a file list.txt and using that as a prerequisite for my index-generation rule, but that would require either editing list.txt manually (trying to avoid it), or auto-generating it in the makefile (this changes the creation time, so I can't use list.txt in the prerequisite because it would trigger the rule every time).
If you need a dependency on the number of files, then... why not just depend on the number itself? The number will be represented as a dummy file that is created when the specified nubmer of files is in the documents directory.
NUMBER=$(shell ls documents/*.txt | wc -l).files
# This yields name like 2.files, 3.files, etc...
# .PHONY $(NUMBER) -- NOT a phony target!
$(NUMBER):
rm *.files # Remove previous trigger
touch $(NUMBER)
index.txt: $(NUMBER)
...generate index.txt...
While number of files is one property to track, instead you may depend on a hash of a directory listing. It's very unlikely that hash function will be the same for two listings that occur in your workflow. Here's an example:
NUMBER=$(shell ls -l documents/*.txt | md5sum | sed 's/[[:space:]].*//').files
Note using -l -- this way you'll depend on full listing of files, which includes modification time, sizes and file names. Bu if you don't need it, you may drop the option.
Note: sed was needed because on my system md5sum yields some stuff after the hash-sum itself.
You can put a dependency on the directory itself for the list file, e.g.
list.txt: documents
ls documents/*.txt 2>/dev/null > $# || true
Every time you add or remove a file in the documents directory, the directory's timestamp will be altered and make will do the right thing.
Here's a solution that updates the index if and only if the set of files has changed:
list.txt.tmp: documents
ls $</*.txt > $#
list.txt: list.txt.tmp
cmp -s $< $# || cp $< $#
index.txt: list.txt
...generate index.txt...
Thanks to the "cmp || cp", the ctime of "list.txt" does not change unless the output of the "ls" has changed.

Resources