Makefile rule which can can choose to update output - makefile

I have a metadata dotfile which I store alongside my code (in each dir of a large project). This metadata file has a list of files in each directory that satisfies a particular constraint (not being auto-generated). This metadata is sincluded in Makefile.
I want to dep other targets (the auto-generated files) on this metadata file. If ever the list of "real" code files in a dir changes, update this metadata file, which will in turn cause the auto-generated files (plural) to be re-made.
I have a rule for the metadata file, and when that rule fires make restarts correctly. But I can't quite figure how to describe what I want. I want the rule to run but only consider $# as having been changed iff I actually touch the file. I can't use the timestamp of the dir because the act of auto-generating file A causes the timestamp to change, which triggers the need to re-generate file B, which cause the timestamp to change ...
I feel like I am missing something obvious, but I can not put my finger on it...
all: prep my-bin
# For demonstration purposes.
prep: real-code.foobar
real-code.foobar:
#touch real-code.foobar
my-bin: meta real-code.foobar genfile-A.foobar genfile-B.foobar genfile-C.foobar
touch $#
meta: .
F=$$(ls *.foobar | grep -v genfile); \
echo "FILES := $$F" > $#
sinclude meta
genfile-A.foobar: meta
touch $#
genfile-B.foobar: meta
touch $#
genfile-C.foobar: meta
touch $#
clean:
rm -f *.foobar my-bin meta

You could update the meta in a shell script at the beginning of the Makefile:
$(shell ls *.foobar | grep -v genfile > meta.tmp; \
diff -q meta.tmp meta && mv meta.tmp meta || rm meta.tmp)
This way the timestamp of meta is only updated if it changes, and it is updated before make has decided which rules to run (meaning dependencies of meta will not automatically rerun).
John

Related

Remake a file if it has changed

I have a simple Makefile that will produce a file
all: build/foo.bin
build/foo.bin: foo.c
gcc $< -o $#
Works great and produces build/foo.bin as expected. If I then do a another make it will say make: Nothing to be done for 'all'. That's expected.
I then do rm build/foo.bin && make and it rebuilds the file. But if I do a echo "Modified" > build/foo.bin make doesn't think that anything has changed make: Nothing to be done for 'all'.
How can I write the rules of the Makefile to re-create the build/foo.bin if the binary ever gets modified outside of the Makefile?
Make compares timestamps between two files. If the target file exists and its timestamp is newer than all of the prerequisites' timestamps, then make decides the target is up to date and it doesn't need to do anything.
Make doesn't maintain some kind of database of timestamps on its own: it relies on the filesystem for that. So make cannot detect when a file changes from what it previously contained. It can only detect when some other file changed after the target file was last updated.
In short, make cannot do what you want it to do, using its standard methods.
If you want to do that you'll have to get complicated and create a way to turn the behavior you want to detect into a file with a timestamp, that make can compare.
One way to do this would be to keep the md5sum of the file in another file, then compare it and update the file only if it's changed. You can try this (I didn't test it):
build/foo.bin: foo.c checksum.out
gcc $< -o $#
md5sum $# > checksum.out
touch $#
checksum.out: FORCE
md5sum build/foo.bin > checksum.tmp; cmp $# checksum.tmp || cp checksum.tmp $#
FORCE: ;
Basically, the FORCE is there to require the md5sum check to always run, but then if the checksum doesn't actually change it doesn't update the output file which means that build/foo.bin won't be rebuilt (at least not because checksum.out is updated).

How to use makefile to trigger data processing when input files changed?

I would like to have data processing performed when launching make and one or more files in the input directory have changed. With my current Makefile processing is not triggered.
Let's say, I'm setting up an exemplary project: two directories, one with data files. Use this script:
#!/bin/bash
mkdir -p proj/input
mkdir -p proj/output
cd proj/input
echo "a2018" > 2018a.txt
echo "b2018" > 2018b.txt
echo "a2019" > 2019a.txt
echo "b2019" > 2019b.txt
My Makefile:
#proj/Makefile
RAW := $(shell find input -name "*.txt")
OUT := ./output
.PHONY: all
all: $(OUT)/*.txt
echo "Running processing of raw files"
$(OUT)/2018merged.txt: $(RAW)
./merge.sh 2018
$(OUT)/2019merged.txt: $(RAW)
./merge.sh 2019
The merge script does basic concatenation of files by year from the filename and saves the result in the file in the output directory.
#!/bin/bash
# proj/merge.sh
echo "-- merging files for $1"
cat input/$1*.txt > "./output/$1merged.txt"
I believed that providing all files in the input directory as a prerequisite will be sufficient but apparently I'm doing something wrong.
I found few questions around similar poblems and partial solutions might be there:
Processing multiple files generated from single input,
Make: How to process many input files in one invocation of a tool?,
Make dummy target that checks the age of an existing file?.
Consider this rule:
all: $(OUT)/*.txt
echo "Running processing of raw files"
This says, all depends on all the files in the $(OUT) directory that match the pattern *.txt.
Well, of course before you've run your makefile there are no files: that's the whole point of your makefile to create them. So when you first run make that pattern expands to nothing, and thus there are no prerequisites for all, and thus nothing is done.
If you want to construct the list of targets to build you have to do it based on the source files, which will always exist, or in this case where they don't match up exactly you have to list them explicitly:
YEARS = 2018 2019
all: $(patsubst %,$(OUT)/%.txt,$(YEARS))
echo "Running processing of raw files"

makefile execute after modify the input or only not executed input

I need to do a makefile for run some programs. Every time I run that script all the file are processed also if the file are not changed. I'm sure there is a problem on my code but I don't understand where I made the mistakes.
RDIR=RAW
OUTDIR=Fusion_res/kallisto
RFILES:=$(wildcard $(RDIR)/*_R1_001.fastq.gz)
DATABASE=/home/sbsuser/databases/Kallsto_hg38_87
OUTFILE=$(patsubst %_R1_001.fastq.gz,%_R2_001.fastq.gz,$(RFILES))
OUTKAL=$(patsubst $(RDIR)/%_R1_001.fastq.gz,$(OUTDIR)/%,$(RFILES))
.PHONY: clean all
all: $(OUTFILE) $(RFILES) $(OUTDIR) $(OUTKAL)
#$(OUTKAL) $(OUTFILE): $(RDIR)/%._R1_001.fastq.gz
# echo "kallisto quant -i" $(DATABASE)/transcripts.idx -b 100 -o $# --fusion $< $(OUTFILE)
$(OUTDIR)/%: $(RDIR)/%_R1_001.fastq.gz $(OUTFILE)
kallisto quant -i $(DATABASE)/transcripts.idx -b 100 --fusion --rf-stranded -o $# $(RDIR)/$*_R1_00
1.fastq.gz $(RDIR)/$*_R2_001.fastq.gz
$(OUTDIR):
mkdir -p $(OUTDIR)
clean::
$(RM) -rf $(OUTDIR)
I suppose if the found some change on the input file and on the output execute the command. I don't know why every time force re-run. In some case Is that I want but I wan to also if there is some new input execute only that.
Thanks so much
A couple of things:
1) $(OUTDIR)/% is dependent on $(OUTFILE) (which is a list of all outfiles). Therefore if you change any one of the OUTFILEs, you make everything in $(OUTDIR)/% obsolete. I believe what you want is this:
$(OUTDIR)/%_R1_001.fastq.gz: $(RDIR)/%_R2_001.fastq.gz
.... (rules to make out/R1 from raw/R2
$(RDIR)/%_R2_001.fastq.gz: $(RDIR)/%_R1_001.fastq.gz
.... (rules to make R2 from R1
This makes each file dependent only on the files that effect it.
2) you have the target all dependent on $(OUTDIR) which is a directory. If you use parallel make, it may generate the $(OUTDIR) after it generates the other dependencies of all: (some of which would depend on $(OUTDIR) being created). What you want there is to remove all's dependency on $(OUTDIR), and add the line:
$(OUTFILE) : | $(OUTDIR)
Notice the |, which means order only (don't consider $(OUTFILE) out of date if $(OUTDIR) is newer. This is important, as a directory's timestamp is updated each time a file in the directory is changed, and so it tends to be newer than its contents.

Why does this makefile recipe always run?

My Makefile downloads a number of third-party files if they are not locally available.
CLOSURE_VERSION=20161024
CLOSURE_BASE_URL="http://dl.google.com/closure-compiler"
build/bin/closure-compiler.jar: build/src/hashes/closure-compiler-${CLOSURE_VERSION}.tar.gz.sha256
download-if-sha-matches <$< >$#.tar.gz \
${CLOSURE_BASE_URL}/compiler-${CLOSURE_VERSION}.tar.gz
tar -zxf $#.tar.gz closure-compiler-v${CLOSURE_VERSION}.jar
mv closure-compiler-v${CLOSURE_VERSION}.jar $#
rm $#.tar.gz
Here, build/src/hashes/closure-compiler-${CLOSURE_VERSION}.tar.gz.sha256 is the saved hash of the version of the file which we already know is correct.
download-if-sha-matches <hash >outfile url downloads the url and compares its hash to stdin, failing if they don't match.
This recipe works except that it always runs, even if build/bin/closure-compiler.jar already exists. Naturally, its timestamp is later than that of $< so I would expect this to not execute the recipe the second time I run make.
What have I gotten wrong?
Looks like tar -x preserves the timestamps of the contained files.
Add this to the recipe.
touch $#

gnu make reloads includes but doesn't update the targets

I'm trying to create a Makefile that will download and process file a file to generate targets, this is a simplified version:
default: all
.PHONY: all clean filelist.d
clean:
#rm -fv *.date *.d
#The actual list comes from a FTP file, but let's simplify things a bit
filelist.d:
#echo "Getting updated filelist..."
#echo "LIST=$(shell date +\%M)1.date $(shell date +\%M)2.date" > $#
#echo 'all: $$(LIST)' >> $#
%.date:
touch $#
-include filelist.d
Unfortunately the target all doesn't get updated properly on the first run, it needs to be run again to get the files. This is the output I get from it:
$ make
Getting updated filelist...
make: Nothing to be done for `default'.
$ make
Getting updated filelist...
touch 141.date
touch 142.date
touch 143.date
I'm using GNU Make 3.81 whose documentation states that it reloads the whole thing if the included files get changed. What is going wrong?
You have specified filelist.d as a .PHONY target, so make believes making that target doesn't actually update the specified file. However, it does, and the new contents are used on the next run. For the first run, the missing file isn't an error because include is prefixed with the dash.
Remove filelist.d from .PHONY. However, remember it won't be regenerated again until you delete it (as it doesn't depend on anything).
By the same token, you should include "default" in .PHONY.
I wrote a shell script rather than lump all this in the makefile:
#!/bin/bash
# Check whether file $1 is less than $2 days old.
[ $# -eq 2 ] || {
echo "Usage: $0 FILE DAYS" >&2
exit 2
}
FILE="$1"
DAYS="$2"
[ -f "$FILE" ] || exit 1 # doesn't exist or not a file
TODAY=$(date +%s)
TARGET=$(($TODAY - ($DAYS * 24 * 60 * 60)))
MODIFIED=$(date -r "$FILE" +%s)
(($TARGET < $MODIFIED))
Replace X with the max number of days that can pass before filelist.d is downloaded again:
filelist.d: force-make
./less-than-days $# X || command-to-update
.PHONY: force-make
force-make:
Now filelist.d depends on a .PHONY target, without being a phony itself. This means filelist.d is always out of date (phony targets are always "new"), but its recipe only updates the file periodically.
Unfortunately, this requires you to write the update command as a single command, and space may be a problem if it is long. In that case, I would put it in a separate script as well.

Resources